The task is to predict whether a person received the seasonal flu, and the H1N1 vaccines using information they shared about their backgrounds.
We have data about their education, family background, vaccine knowledge, etc.
Balázs was the presenter of the night for the audience of about 18 people.
He talked about his solution for the challenge, and showed the whole notebook he made.
Started with reading the data, and finding out which columns can we use.
A few of them were eliminated instantly because of the high NaN ratio.
The others got their mapping, and NaN filling – this time with the median.
The same mappings were applied to the test data.
DrivenData’s baseline model was 81.85%, that was the number to beat. Logistic Regression, RandomForest were tried at first, with a little parameter tuning, then LightGBM, GradientBoosting and CatBoost came into the picture, and after they were gathered into a VotingClassifier, the test result was promising.
It was sent back to DrivenData’s algorithm to give an official score,
and it was a great 84.08%, so not only we beat the baseline model, we went to the best 15% of all the competitors.
A lot of possibilities are not yet covered, so there is much to do with the notebook. The competition is not over yet, so you can submit your own solution, and try to beat the current best score, 86.58%.
Thanks to Dóri and Berci for helping answer the questions,
which we had quite a few.
Thanks for everyone for attending!
As always, the notebooks are on our GitHub:
You can join us on our meetup page: