Tag Archives: pandas

Titanic: Machine Learning

Berci asked me to upload my version of kaggle’s Titanic competition. Together on our workshop we achieved around 78%, which was a good starting point.

Speaking about the workshop: in January 2019 a Data Science group formed on Facebook, called Data Revolution:
https://www.facebook.com/groups/DatasRev/
Feel free to join.

Solving this task at first I started with the standard Decision Tree, without any tuning. Then I get into GridSearchCV and RandomizedSearchCV for the best parameters. But after tweaking the model with these validations, I still couldn’t get higher than 79%. RandomForest didn’t help either.

That’s when I found XGBoost, a powerful model, getting more and more attention in machine learning. With it, I could go over 80%.

If you have any questions, or tips, you can find me on LinkedIn:
https://www.linkedin.com/in/baloghbalazs88/

You can find the notebook on:
https://anaconda.org/bbalazs88/titanic/notebook