Azure ML Studio, Parsing XML, Survival Analysis
As we don’t know when can we meet again in person, it was time to organize an online event. Therefore, our last workshop in this season before the summer break also happened to be our first online workshop/ webinar .
We chose Cisco Webex as our conference platform. We had the common minor issues (e.g. how to disable the notification sound for entering or leaving the room), and the screen sharing options menu in the upper part of the screen made it a little difficult to change between the tabs in the browser. It’s challenging to speak for 15-20 minutes when you can’t see the audience to see their reactions.Luckily, some of the organizers turned on their cameras so the presenter could see some faces while presenting.
One of the webinars that inspired us to do this event was the PyData meetup group’s last online meetup. They were nice enough to recommend our event in their newsletter. We did not forget to thank them in our welcome speech.
First, Balázs Balogh talked about the recent changes in his professional life, how the idea of organizing the webinar was formed and became a reality.
Krisztián Hackl also talked about the importance of the budapestpy community and the fact that the meetups and workshops are focusing on helping each other learning (python, data science, data analysis).
After the welcome speeches Balázs Balogh proceeded to talk about ElementTree, an XML parser Python dictionary. He guided us through a DataCamp tutorial, which involved talking about the structure of an XML document, how to access the elements, how to search among them, and finally how to modify it. The example file was a little IMDB dataset. There were a few misplaced movie titles, which we moved back to their original place.
The next presentation was from Bertalan Rónai, who talked about the Azure ML Learning Studio (classic) and showed a relatively simple model based on the Kaggle Titanic competition.
Azure ML Studio has useful built-in data visualizations such as histograms, boxplots and scatter plots, which are very useful aids for feature selection. One of the new ideas that this particular model used was to use Replace using MICE to clean the missing values in the Age, Passenger class and Embarked columns. The stratification in the split data module used the sex/gender column instead of the most frequently used survived column to try to balance the distribution of genders in the train and test datasets. The experiment used a Two-Class Boosted Decision Tree this time – for the sake of speed and simplicity – without a Tune Model Hyperparameters module. We achieved an AUC of 0.860, Accuracy of 0.811 and Precision of 0.762 which is pretty good for a simple model like this one.
It’s very easy to compare, evaluate different models to test our different ideas to improve the efficiency. The simpler model based on an experiment where Bertalan compared four different models.
Azure ML Studio (Classic) is free, you can start having fun right after reading this post.
The last presenter of the webinar was Dóra Szabó, who brought a tutorial showcasing the basic functions and visualisations from the lifelines package for survival analysis. She walked us through the basic concepts regarding this type of data, such as the different types of censoring, some of the most common estimators used.
She then showed a few examples which helped to grasp how the different censored cases contribute to the survival curves, how can you calculate the number of at risk cases at a given time point, what determines the size of the step corresponding to a single event ( & why do they vary). As survival analysis can bring useful insight to varius problems and can help you answer some otherwise tricky questions easily, it is worth to spend a few hours tinkering with this package!
Thanks for everyone for attending!
As always, the notebooks are on our GitHub:
You can join us on our meetup page: