Tag Archives: data science

BudapestPy Workshops 107 (2019-11-07)

We started the November at One Identity’s place. Thank One Identity , and Balázs Antal for the venue, and the beer and pizza. All of our workshops in this month will be held at One Identity .

We had three things to talk about this week. The first is you, our attendees. Some of you have been with us for the past two months, and for some of you, this event was the first – but hopefully not the last. We wanted to get to know you better, so we asked everyone to introduce themselves, talk about the connection with Python and Data, and the future goals.
It was really helpful to us, to know which direction we should go next year.

Continue reading BudapestPy Workshops 107 (2019-11-07)

BudapestPy Workshops 104 (2019-10-09)

Our fourth workshop took place at CEU. Our host for the evening was András Vereckei & Arieda Muço. András showed us their open source projects facilitating free knowledge transfer in a wide range of collected and curated topics (see links below)

https://datacarpentry.org/lessons
https://software-carpentry.org/lessons
https://librarycarpentry.org/lessons

This was an exceptional occasion in multiple aspects: our first workshop in English, the largest number of attendees so far (thanks to the large seminar room provided by our generous host) and this time we only had one presenter for the whole evening.

Continue reading BudapestPy Workshops 104 (2019-10-09)

Titanic: Machine Learning

Berci asked me to upload my version of kaggle’s Titanic competition. Together on our workshop we achieved around 78%, which was a good starting point.

Speaking about the workshop: in January 2019 a Data Science group formed on Facebook, called Data Revolution:
https://www.facebook.com/groups/DatasRev/
Feel free to join.

Solving this task at first I started with the standard Decision Tree, without any tuning. Then I get into GridSearchCV and RandomizedSearchCV for the best parameters. But after tweaking the model with these validations, I still couldn’t get higher than 79%. RandomForest didn’t help either.

That’s when I found XGBoost, a powerful model, getting more and more attention in machine learning. With it, I could go over 80%.

If you have any questions, or tips, you can find me on LinkedIn:
https://www.linkedin.com/in/baloghbalazs88/

You can find the notebook on:
https://anaconda.org/bbalazs88/titanic/notebook