We had the last workshop in 2020, with a DrivenData competition just like last year. It was an online event held on Microsoft Teams. We chose the Flu Shot Learning, because it fits well to the current situation: https://www.drivendata.org/competitions/66/flu-shot-learning/page/210/
The task was to predict whether people got H1N1 and seasonal flu vaccines using information they shared about their backgrounds, opinions, and health behaviors. It was a new kind of problem to some of us, because it is a multi label classification, not a multi class classification like the Pump it Up! challenge from 2019.
The first presenter was Berci, who made a great Power BI dashboard about the data using Power BI. His pattern was to measure the rate of people who received a vaccine in a group compared to the number of people in the same group in the survey respondents. Berci used Power BI’s AI visuals such as the Key Influencers, Decomposition Tree and Smart Narratives.
You can download the report file from github.
The second presenter was Hajnalka Kristóffy. She talked first about exploratory data analysis in python using standard numpy and pandas functions along a counting function written from scratch. Then she explained the usefulness of the correlation table in finding noteworthy connections and collinearity within the dataset.
Then Dóri showed us a new tool, ProfileReport (pandas_profiling) #thanks Tamás Ribárszki for giving us the tip!, which is a huge timesaver during EDA and a quick and dirty first solution, using a neural network modell, a Multi-layer Perceptron classifier. We discussed why is it important to use an algorithm which supports multi-label and how to prepare a valid submission for the platform.
Thanks for everyone for attending!
As always, the notebooks are on our GitHub:
You can join us on our meetup page: