Our third workshop took place at One Identity. Our host for the evening was Antal Balázs, the company’s employer branding specialist.
The first session was Doma Miklós’s introduction to the bokeh library. It provided us good template for creating our own interactive visuals. The figures became more complex as the presentation went forward. Lastly we got a little insight into bokeh’s geo-coordinates.
In the second session, Endreffy Zsolt from One Identity walked us through setting up a virtual environment and a testing example using PyCharm. He prepared with an example: looking for the Chernobyl disaster’s date on it’s Wikipedia page.
Once the learning part was over, we had the opportunity to join our host’s fabulous beer and hot-dog party.
Between the hotdogs, donuts and the beers, Antal Balázs also treated us to a behind the scenes tour showing us their facilities and telling us about OneIdentity’s history.
Our next event is going to be at the Central European University (CEU), check out the event page!
The workshop was the first since we started the cooperation with the budapest.py meetup group, and it was the 15th we’ve done since January, when we started to study together.
First Dóri talked about Anaconda and Jupyter notebooks, then showed us some basic pandas code for EDA (Exploratory Data Analysis) on a dataset about Pokemons. She walked through us how to get insights from our data, how to slice it, make new entries and lots more. After some basic visualization and aggregation we found out which Pokémon is the strongest based on our simple analysis. The second part was Berci’s and his Pandas tutorial notebook, which holds over a hundred different pandas functions and examples. He gave a short tour of the notebook. We aim to create some tutorial notebooks to help us focus on understanding the current dataset and spend less time looking up functions. These notebooks are going to help newcomers catch up with returning participants.
Solving this task at first I started with the standard Decision Tree, without any tuning. Then I get into GridSearchCV and RandomizedSearchCV for the best parameters. But after tweaking the model with these validations, I still couldn’t get higher than 79%. RandomForest didn’t help either.
That’s when I found XGBoost, a powerful model, getting more and more attention in machine learning. With it, I could go over 80%.