Crime in the USA Report

I started my career as a MOLAP cube builder which left space for my creativity to improve the model that was in the data warehouse. It was always a great feeling to create a tool that could give the users ability to create reports on their own.

I wanted to play the same game with some datasets about crime in the USA from this webpage maintained by the FBI:

About the Crime Data Explorer

The Crime Data Explorer (CDE) is the FBI’s Uniform Crime Reporting (UCR) Program’s dynamic solution to presenting crime data in a more immediate venue that reflects the constant change in the nation’s crime circumstance.

The CDE pages provide a view of estimated national and state data, reported agency-level crime statistics, and graphs of specific variables from the National Incident-Based Reporting System (NIBRS).

Continue reading Crime in the USA Report

SQLSaturday Budapest

I attended the event with some friends on 2019-04-20.
Details below.

Building a modern data warehouse and BI solution in Microsoft cloud by Gergely Csom

Gergely gave an overview of Microsoft’s data warehouse related tools and services including SSIS, Azure Data Factory Dataflow and Power BI Dataflow, Azure Analysis services and many more.

Top 10 SSAS Design Best Practice (vagy talán több is) by Zoltán Horváth

It was a very nice trip down memory lane to hear a talk about OLAP cubes. I spent a great amount of my professional career building MOLAP cubes and this talk made we want to build one again just for fun.

How to win Kaggle competition and get familiar with machine learning ? by Marcin Szeliga

Continue reading SQLSaturday Budapest

Guess who would survive the Titanic with the Key Influencers visual!

Hi I am Bertalan Ronai from Hungary.

The following tutorial is going to describe my Power BI report that is using the new Key Influencers visual to enter in a machine learning competition on Kaggle using only Power BI.

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history.  On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew.

Continue reading Guess who would survive the Titanic with the Key Influencers visual!

Titanic: Machine Learning

Berci asked me to upload my version of kaggle’s Titanic competition. Together on our workshop we achieved around 78%, which was a good starting point.

Speaking about the workshop: in January 2019 a Data Science group formed on Facebook, called Data Revolution:
Feel free to join.

Solving this task at first I started with the standard Decision Tree, without any tuning. Then I get into GridSearchCV and RandomizedSearchCV for the best parameters. But after tweaking the model with these validations, I still couldn’t get higher than 79%. RandomForest didn’t help either.

That’s when I found XGBoost, a powerful model, getting more and more attention in machine learning. With it, I could go over 80%.

If you have any questions, or tips, you can find me on LinkedIn:

You can find the notebook on:

Hit Refresh

I have recently finished Satya Nadella’s book Hit Refresh. @Alex Powers recommended it on LinkedIn so I had ordered myself a paperback copy.

I liked reading about the new CEO’s career path and family. I like reading about successful people’s career paths, that’s why I am reading Arnold Schwarzenegger’s Total Recall for the second time.

The author talks about Microsoft’s company culture, emerging new technologies such as AI or quantum computers and the importance of empathy in a company.

I should not try hard to convince you to read this book. If you work with data on a professional level, this book is for you.