I started my career as a MOLAP cube builder which left space for my creativity to improve the model that was in the data warehouse. It was always a great feeling to create a tool that could give the users ability to create reports on their own.
I wanted to play the same game with some datasets about crime in the USA from this webpage maintained by the FBI: https://crime-data-explorer.fr.cloud.gov/downloads-and-docs
About the Crime Data Explorer
The Crime Data Explorer (CDE) is the FBI’s Uniform Crime Reporting (UCR) Program’s dynamic solution to presenting crime data in a more immediate venue that reflects the constant change in the nation’s crime circumstance.
The CDE pages provide a view of estimated national and state data, reported agency-level crime statistics, and graphs of specific variables from the National Incident-Based Reporting System (NIBRS).
Since 1930, participating local, county, state, tribal, and federal law enforcement agencies have voluntarily provided the nation with crime statistics through the Uniform Crime Reporting (UCR) Program
I have used the following datasets (csv files) in my Power BI report:
|LEOKA_ASSAULT_TIME_WEAPON_INJURY.csv||Information about LEOKA Time of incidient, weapon, and injury Data|
|LEOKA_ASSIGNMENT_ACTIVITY.csv||Information about LEOKA Assignment and Activity Data|
|hate_crime.csv||Information about Hate Crime Data at Incident level|
|arrests_national.csv||Information about National Arrest Data|
|Cargo_Theft_Data_2013-2017.csv||File containing Cargo Theft Master File Data|
|pe_employee_data.csv||Information about Police Employee Data|
|Human_Trafficking_Download_2013-2017.csv||Information about Human Trafficking|
|territories.csv||Information about US Territories|
You can read about the individual datasets on the website referenced above.
Let’s jump ahead and look at the Key Influencers Visual!
The one in the top left corner says that arsons are 183 times more likely to happen on Abandoned/Condemned Structures which sounds nice but the second line is sad news, arsonist frequently target places of religion. I want to say that you should not forget that these numbers are here because someone committed these crimes and it had victims so you can’t look at these numbers the way you would look at a fiscal report.
The bottom left visual is interesting. When the Victim types is Business the Bias Description is likely to be Anti-Atheism/Agnosticism. Why? What does this mean in practice?
The visual on the right is examining the Motor Vehicle Thefts by location name and says that it is 576 times more likely to happen in a Farm Facility. It is worth a check, right? The truth is that there is only one event where the Offense Name is Motor Vehicle Theft and the Location Name is Farm Facility.
Why does the
visual show the wrong result?
Because had I ignored the following warning:
Oh, no it seems like the Arson data was also wrong. Arsonist most likely choose residences and churches and not abandoned buildings. So it seems like a good idea to check the results of the Key Influencers Visual with some exploratory data analysis.
Let’s choose a nicer topic for the Key Influencer visual! Analyze Gender by Years and Employee count.
The main ideas behind creating the queries, tables of the report were the following:
I have downloaded the csv files to a folder and then used the import from folder as a data source. Created a FolderPath parameter for you so you won’t have to create the same C:\DataSources\USA_Crime_Data\ that I had used.
An interesting thing happens when you import csv files in Power BI and Excel, it adds a number of columns argument to your query. If the same csv would have more columns the next time you refresh it, the new columns would not be added to your query.
let Source = Csv.Document(File.Contents("C:\DataSources\USA_Crime_Data\arrests_national.csv"),[Delimiter=",", Columns=34, Encoding=1250, QuoteStyle=QuoteStyle.None]) in Source
You should delete those parts like this one: Columns=34.
let Source = Csv.Document(File.Contents("C:\DataSources\USA_Crime_Data\arrests_national.csv"),[Delimiter=",", Encoding=1250, QuoteStyle=QuoteStyle.None]) in Source
Also if you want to use “.” as a delimiter, you can change it in the query editor.
I needed use the unpivot function a lot of times because most queries contain data in a tabular format.
After unpivoting the same query look like this:
Create common dimensions
These queries have some common dimensions such as STATE_ABBR as state abbreviation (AK for Alaska) which defines the State Name and Region Name columns. The same goes for the Years column, all of these are Yearly numbers except the Hate Crime dataset. So I created a common Years column. You can plot the number of arrests from the Arrests National table or the number of hate crimes from the HateCrime table and the amount of stolen value from the Cargo Theft table with the same Year X axis. To create the HateCrimeDate table, I had used a modified version of Matt Allington’s power query date table template.
My data model is pretty nice, right? My report is not flawless but it serves the purpose to show how great Power BI is when you want to create data models.
Download the report from my Github page and have fun with it!
Not everything is compatible with an embedded report so check out my report in pdf!
View my report in a full page by clicking the map below!