Tag Archives: python

BudapestPy Workshops 104 (2019-10-09)

Our fourth workshop took place at CEU. Our host for the evening was András Vereckei & Arieda Muço. András showed us their open source projects facilitating free knowledge transfer in a wide range of collected and curated topics (see links below)

https://datacarpentry.org/lessons
https://software-carpentry.org/lessons
https://librarycarpentry.org/lessons

This was an exceptional occasion in multiple aspects: our first workshop in English, the largest number of attendees so far (thanks to the large seminar room provided by our generous host) and this time we only had one presenter for the whole evening.

Dóri walked us through a mini project from driven data, and guided us from data cleaning til submitting a solution to this competition.
We learnt about time-series data, visualization, Gradient boosting for regression, train and test data sets, and how to evaluate or models’ performance. We mostly used the sklearn library to achieve these.

Thank you for the active participation, helping each other sort out the technical difficulties and especially for the group thinking, we enjoyed this session a lot!

Our next event is going to be at the Central European University (CEU), check out the event page!

Thanks for everyone to show up!
As always, the notebooks (the full, and the one with missing parts) are on our GitHub:
https://github.com/budapestpy-workshops

You can join us on our meetup page:
https://www.meetup.com/budapest-py/

The Team: Balogh Balázs, Rónai Bertalan, Szabó Dóra, Doma Miklós, Hackl Krisztián and Zsarnowszky Lóránt (last name, first name order)

BudapestPy Workshops 103 (2019-10-02)

Our third workshop took place at One Identity. Our host for the evening was Antal Balázs, the company’s employer branding specialist.

The first session was Doma Miklós’s introduction to the bokeh library. It provided us good template for creating our own interactive visuals. The figures became more complex as the presentation went forward. Lastly we got a little insight into bokeh’s geo-coordinates.

In the second session, Endreffy Zsolt from One Identity walked us through setting up a virtual environment and a testing example using PyCharm. He prepared with an example: looking for the Chernobyl disaster’s date on it’s Wikipedia page.

Once the learning part was over, we had the opportunity to join our host’s fabulous beer and hot-dog party. Between the hotdogs, donuts and the beers, Antal Balázs also treated us to a behind the scenes tour showing us their facilities and telling us about OneIdentity’s history.

Our next event is going to be at the Central European University (CEU), check out the event page!

Thanks for everyone to show up!
As always, the notebooks (the full, and the one with missing parts) are on our GitHub:
https://github.com/budapestpy-workshops

You can join us on our meetup page:
https://www.meetup.com/budapest-py/

The Team: Balogh Balázs, Rónai Bertalan, Szabó Dóra, Doma Miklós, Hackl Krisztián and Zsarnowszky Lóránt (last name, first name order)

BudapestPy Workshops 102 (2019-09-25)

The second workshop picked up, where the first ended. A quick recap from three weeks before: Dóri showed us how to handle a CSV dataset in pandas, how to sort, and count values, rename, remove columns, and tricks like this. She used a Pokemon dataset, which wasn’t just funny, but easy to follow.

So after we get familiar with pandas, it’s time to look into the Machine Learning part. I (Balázs) was the one who prepared for this event with an Unsupervised Learning problem.

Our venue was the oktatoterem.com again, and our room was designed for 18 people. It was so great to see, that more than 20 of you came to spend this evening learning Python with us! We were packed, but we managed this with extra chairs.

I made a Jupyter notebook with DataCamp’s “Musical Recommender” sample dataset. I left out some parts of the notebook, to think together, and I got some interesting questions, and ideas about where to go next, or how to evaluate this task. Our goal was to recommend artists to users with the same taste.

For example, if you listen to a lot of The Beatles, it recommends you Beach Boys, or Bob Dylan. To achieve this we reshaped our data, put it in a csr_matrix, scaled it, and reduced the dimension from 111×500 to 111×20. After these steps we could try various artists and discuss, if our model is good enough.

Thanks for everyone to show up!
As always, the notebooks (the full, and the one with missing parts) are on our GitHub:
https://github.com/budapestpy-workshops

You can join us on our meetup page:
https://www.meetup.com/budapest-py/

The Team: Balogh Balázs, Rónai Bertalan, Szabó Dóra, Doma Miklós, Hackl Krisztián and Zsarnowszky Lóránt (last name, first name order)

BudapestPy Workshops 101 (2019-09-04)

The workshop was the first since we started the cooperation with the budapest.py meetup group, and it was the 15th we’ve done since January, when we started to study together.

First Dóri talked about Anaconda and Jupyter notebooks, then showed us some basic pandas code for EDA (Exploratory Data Analysis) on a dataset about Pokemons. She walked through us how to get insights from our data, how to slice it, make new entries and lots more.
After some basic visualization and aggregation we found out which Pokémon is the strongest based on our simple analysis. The second part was Berci’s and his Pandas tutorial notebook, which holds over a hundred different pandas functions and examples. He gave a short tour of the notebook. We aim to create some tutorial notebooks to help us focus on understanding the current dataset and spend less time looking up functions. These notebooks are going to help newcomers catch up with returning participants.

All of our material is avaliable on GitHub: https://github.com/DatasRev/budapest.py_workshops

This workshop was about getting and inspecting the data. Next time we show how to build a basic machine learning model using another sample dataset.

You can join us on our meetup page:
https://www.meetup.com/budapest-py/

The Team: Balogh Balázs, Rónai Bertalan, Szabó Dóra, Doma Miklós, Hackl Krisztián and Zsarnowszky Lóránt (last name, first name order)

Titanic: Machine Learning

Berci asked me to upload my version of kaggle’s Titanic competition. Together on our workshop we achieved around 78%, which was a good starting point.

Speaking about the workshop: in January 2019 a Data Science group formed on Facebook, called Data Revolution:
https://www.facebook.com/groups/DatasRev/
Feel free to join.

Solving this task at first I started with the standard Decision Tree, without any tuning. Then I get into GridSearchCV and RandomizedSearchCV for the best parameters. But after tweaking the model with these validations, I still couldn’t get higher than 79%. RandomForest didn’t help either.

That’s when I found XGBoost, a powerful model, getting more and more attention in machine learning. With it, I could go over 80%.

If you have any questions, or tips, you can find me on LinkedIn:
https://www.linkedin.com/in/baloghbalazs88/

You can find the notebook on:
https://anaconda.org/bbalazs88/titanic/notebook