DrivenData Challenge Seasonal Flu and H1N1

DrivenData Challenge Seasonal Flu and H1N1

We decided to enter a competition for a DrivenData Challenge in the next event of the BudapestPy workshop series.
We expect the attendees to help us, we plan to have a very interactive event.

FYI before you get bored of this long post: I already made an appointment and had my seasonal flu-shot.

Your goal is to predict how likely individuals are to receive their H1N1 and seasonal flu vaccines. Specifically, you’ll be predicting two probabilities: one for h1n1_vaccine and one for seasonal_vaccine.
You are provided a dataset with 36 columns.

Each row in the dataset represents one person who responded to the National 2009 H1N1 Flu Survey.

I created a report in Power BI for the EDA (exploratory data analysis) part of the challenge to help us understand the dataset,
the features and their contributions to detect whether a person is going to receive a vaccine.
If you don’t see the PDF below try reading the post in Chrome.

It seems like the Publish to Web sharing method does not show all visuals so you could not see the Key influencers visual if I shared the report that way. From a security perspective the Publish to Web sharing method is only good for materials intended for the general public.
I have even named my workspace as Public Demos.
You can download the Pbix file from Github.

The first feature I examined was the health_worker flag. Blank or null means no answer to this question, 1 is a health worker, 0 is not.

When you analyze a survey, it’s important to know whether you are looking at the whole population for example when you participate in an internal course at your company and asked to leave feedback at the end or you are looking at a sample of the population like in this survey where there are 26 thousand respondents of a National Survey.

In this latter case my pattern was to put side-by side the nominal numbers such as the number of respondents or the number of people who received a vaccine in a given group and the percentage of the people by the number of respondents in the group.

For an example 11% of the respondents are health workers which group is 20% of the total who received the H1N1 vaccine but if we divide the number of people in a group by the number of respondents in a group we get 41% in this case that shows that being a health worker significantly increases the odds of a person receiving the H1N1 vaccine.

Tried the new smart narratives feature

I tried the new smart narratives feature that was published in the september, 2020 version of Power BI Desktop. The 3 percentages in the text box are dynamic. I have achieved this by referencing measures and combining it with natural language expressions.

So I did not have to create a new measure where my [respondents by total %] is filtered for health workers, I could do it by adding the “with health worker 1” words after my measure. The value is the same as in the matrix above.

both where and with are good for filtering

Next the Key influencers visual shows the the factor of being a health worker in the receiving each vaccine.

I have used the same two pages for the age group and it seems like that above the age of 54 people are more likely to receive vaccines.
By the way I am going to call my doctor this afternoon to ask for the seasonal vaccine.

The decomposition tree visual is not new, you can interactively visualize the contribution of factors for a measure’s increase. One important consideration is that you can set the analysis type from absolute to relative to help you become less dependent of the sample distribution.

Not surprisingly the Income poverty pages show that for the seasonal vaccine people who are below the poverty line are less likely to receive a vaccine compared to the other groups and for the H1N1 vaccine people in the best income bracket were more likely to receive a vaccine.

Regions are contributing factors also but the differences are not large. There are two regions with slightly lower H1N1 received values and two with more noticeably lower seasonal numbers. In one region both H1N1 and seasonal numbers are lower.

There are a lot more features such as the opinion and behavioral ones. For example Opinion of H1N1 risk is very high and behavioral face mask is 1.
Instead of creating dozens of report pages for the individual attributes I created an interactive page – Opinion and Knowledge – where there are two sets of slicers. One is not affected by the slicers and the others are filtered by the users interest to see whether the applied filters are meaningfully contribute to receiving a vaccine.

Effect of the opinion of H1N1 risk being very high

I am very interested in our next workshop event and I hope that you can attend and we can help each other gaining insights from the data.

Join us on 2020-11-02!
Meetup Link: