May 102016
 

Note: Special thanks to Joe Mako who came up with a considerably more elegant and extensible solution than I had first built. I describe Joe’s approach in this post.

Overview

I’m fortunate that when I have to work with survey data that is not well-coded I can take out my trusty copy of Alteryx Designer and get my data “just so.”

But what if you don’t have a copy of Alteryx? Or what if you want to use Qualtrics’ new Web Data Connector and don’t have the ability to insert a data prep tool within your workflow?

In this blog post we’ll see how you can get Tableau to address the problem.  Specifically, we’ll see how to work with check-all-that-apply questions where the data has been coded as 1s and blanks instead of 1s and 0s.

Note: While the example shown focuses on Qualtrics data the approach will work with any tool that leaves blanks for respondents that did not select an item (e.g., Survey Gizmo).

So, What’s Wrong with Blanks?

Consider the data snippet below that shows responses to a group of check-all-that-apply questions. This is what I consider to be badly-coded data.

Figure 1 -- Check all that apply questions as they might appear in a CSV file downloaded from Qualtrics

Figure 1 — Check all that apply questions as they might appear in a CSV file downloaded from Qualtrics

Now here’s how I would prefer the data to be coded.

Figure 2 -- Correctly-coded check all that apply questions

Figure 2 — Correctly-coded check all that apply questions

So, why is this better?

By having 1s and 0s we can see, easily, if some respondents didn’t answer the question group at all. In the figure below we can see that respondents 1002, 1006, 1011, and 1016 did not make any selection for any of the check-all-that-apply questions.  More importantly, with data coded with 1s and 0s, Tableau we can just filter out the Nulls and we’ll get accurate results.

 Figure 3 -- Respondents that didn't answer this question group are highlighted in yellow


Figure 3 — Respondents that didn’t answer this question group are highlighted in yellow

IMPORTANT: Before going any further, make note that respondents 1002, 1006, 1011, and 1016 did not make any selections and that of the 13 people that actually responded to Q1_4,  12 selected that option.  There will be a test.

So, given our constraints, how are we going to come up with a way to distinguish between a blank that means a respondent didn’t select a particular option and a blank that, when combined with three other blanks, means the person didn’t respond to the group of questions at all?

Before we plow into this I want to make sure people realize that we will soon pivot / reshape the data so that it is tall. Specifically, instead of dealing with this…

Figure 4 -- Data before reshaping (as you would get it if you downloaded a .CSV file.)

Figure 4 — Data before reshaping (as you would get it if you downloaded a .CSV file.)

… we’re going to be working with this:

Figure 5 -- Reshaped data using Tableau's built-in Pivoting feature. If you're using the Qualtrics web data connector you don't need to use this.

Figure 5 — Reshaped data using Tableau’s built-in Pivoting feature. If you’re using the Qualtrics web data connector you don’t need to use this.

Once we pivot the data we’ll see the following collection of dimensions and measures in Tableau.

Figure 6 -- Reshaped data as it appears in Tableau

Figure 6 — Reshaped data as it appears in Tableau

Note: For this example I’m not going to the trouble of creating a data set that has the survey responses in both numeric and text format. The technique we explore will work equally well for when you have both as when you just have numeric values.

Grouping the related questions

To be able to come up with an elegant, extensible way to tell Tableau “don’t take into account questions where people didn’t make any selections” we’re going to need a way to group related questions together.

Note that the technique I’m about to propose is different from the ad-hoc grouping approach I describe in Using Qualtrics’ Web Data Connector and Reshaping Survey Data with Tableau 9.0.  The programmatic “group” we create here needs to be referenced in a calculated field, and you can’t do that with an ad-hoc group.

Here we’re going to take advantage of Qualtrics’ Question ID naming convention and how related questions all look like this:

Q(N)_

That is, a group of related check-all-that-apply questions might be named like this:

Q1_1
Q1_2

Q1_8

And a group of related Likert scale questions might be named like this:

Q7_1
Q7_2

Q7_13

Given this standardization, we need a mechanism that will group everything to the left of the underscore. There are several ways we can achieve this, but for this example we’ll create a new field called “Question Prefix” and define it as follows:

Figure 7 -- Calculated field to that will group related questions based on whatever comes before the underscore in the Question ID field.

Figure 7 — Calculated field to that will group related questions based on whatever comes before the underscore in the Question ID field.

Now let’s use this field and the Resp ID field to see who did not respond at all to a particular group of questions.

Figure 8 -- This collection of fields show us who did not respond at all to a particular group of questions.

Figure 8 — This collection of fields show us who did not respond at all to a particular group of questions.

So, have a look at results in the image above where we place MAX(Value) on the text button. What’s happening is that for each respondent we’re looking at the answers to Q1_1, Q1_2, Q1_3, and Q1_4 and taking the maximum value across all those questions.  Notice that there are four blanks / Nulls for 1002, 1006, 1011, and 1016 — these are the four respondents that didn’t answer the question group at all (see, I told you there would be a test.)

We do NOT want to include these people when we calculate the percentage of people that selected a particular choice, so we can place MAX(Value) on the Filters shelf and indicate that we should not include Null values, as shown below.

Figure 9 -- Filter that will exclude Null values.

Figure 9 — Filter that will exclude Null values.

Applying this filter will show us responses from people that did NOT skip the question, as shown below.

Figure 10 -- Respondents that selected at least one option.

Figure 10 — Respondents that selected at least one option.

Okay, this is great, but the visualization we want to create shouldn’t look anything like this. Indeed, we have too many things on the level of detail at this point.

So what are we going to do?

Level-of-Detail Calculations to the Rescue

Anytime you think “I need Tableau to consider some dimensions when making a calculation, but those dimensions can’t be in the visualization” a giant alarm bell should go off in your head — this is precisely the use case for a LoD calculation!

Here’s the LoD calc, that when placed on the Filters shelf, will do the trick:

Figure 11 -- LoD calc that allows us to exclude folks that skipped answering a question group.

Figure 11 — LoD calc that allows us to exclude folks that skipped answering a question group.

A translation into English would be:

Even though [Question Prefix] and [Resp ID] may not be on the viz detail, take that combination of these two fields and find the MAX([Value]). Then find only the Null values from this and exclude them.

Note that we are using FIXED instead of INCLUDE as the former will automatically create a dimension and will work no matter what other filter settings may be in place.

Putting This Altogether

First we need to create a calculation that gives us the percentage of people that selected an option from the group of check-all-that-apply questions. For this instance the following calculation will work well.

Figure 12 -- How to determine the percentage of people that selected an option.

Figure 12 — How to determine the percentage of people that selected an option.

A quick translation into English would be:

Add up everybody that answered “1” (meaning they selected an option) and divide by the number of people that answered the question.

Let’s see if this works. Consider the visualization shown below.

Figure 13 -- First attempt at showing the percentage of respondents that selected an option. This is NOT showing the correct percentage.

Figure 13 — First attempt at showing the percentage of respondents that selected an option. This is NOT showing the correct percentage.

Does this pass the “smell” test? Look at Q1_4. You may recall that we saw that 12 out of 13 people selected this option — but 12 out of 13 is 92% not, 71%.  Indeed, if we place SUM([Number of Records]) as a discrete measure on the rows shelf we see that Tableau believes 17 people responded to the question.

Figure 14 -- Without a filter Tableau believes 17 people responded to this group of questions

Figure 14 — Without a filter Tableau believes 17 people responded to this group of questions

We need to put our LoD calc on the Filters shelf and set it to True.

Figure 15 -- Removing "All of these are Null" from the respondent pool gives us the correct count of 13 and the correct percentage (12 out of 13).

Figure 15 — Removing “All of these are Null” from the respondent pool gives us the correct count of 13 and the correct percentage (12 out of 13).

Conclusion

How check-all-that-apply questions are coded is a problem with many survey tools including Qualtrics and Survey Gizmo. If you have a good data preparation tools like Alteryx you can fix the problem, but if you don’t have such a tool or don’t want to insert something into your workflow then you can get Tableau to address the problem with a Level-of-Detail calculation.