Note: Special thanks to Joe Mako who came up with a considerably more elegant and extensible solution than I had first built. I describe Joe’s approach in this post.
I’m fortunate that when I have to work with survey data that is not well-coded I can take out my trusty copy of Alteryx Designer and get my data “just so.”
But what if you don’t have a copy of Alteryx? Or what if you want to use Qualtrics’ new Web Data Connector and don’t have the ability to insert a data prep tool within your workflow?
In this blog post we’ll see how you can get Tableau to address the problem. Specifically, we’ll see how to work with check-all-that-apply questions where the data has been coded as 1s and blanks instead of 1s and 0s.
Note: While the example shown focuses on Qualtrics data the approach will work with any tool that leaves blanks for respondents that did not select an item (e.g., Survey Gizmo).
So, What’s Wrong with Blanks?
Consider the data snippet below that shows responses to a group of check-all-that-apply questions. This is what I consider to be badly-coded data.
Now here’s how I would prefer the data to be coded.
So, why is this better?
By having 1s and 0s we can see, easily, if some respondents didn’t answer the question group at all. In the figure below we can see that respondents 1002, 1006, 1011, and 1016 did not make any selection for any of the check-all-that-apply questions. More importantly, with data coded with 1s and 0s, Tableau we can just filter out the Nulls and we’ll get accurate results.
IMPORTANT: Before going any further, make note that respondents 1002, 1006, 1011, and 1016 did not make any selections and that of the 13 people that actually responded to Q1_4, 12 selected that option. There will be a test.
So, given our constraints, how are we going to come up with a way to distinguish between a blank that means a respondent didn’t select a particular option and a blank that, when combined with three other blanks, means the person didn’t respond to the group of questions at all?
Before we plow into this I want to make sure people realize that we will soon pivot / reshape the data so that it is tall. Specifically, instead of dealing with this…
… we’re going to be working with this:
Once we pivot the data we’ll see the following collection of dimensions and measures in Tableau.
Note: For this example I’m not going to the trouble of creating a data set that has the survey responses in both numeric and text format. The technique we explore will work equally well for when you have both as when you just have numeric values.
Grouping the related questions
To be able to come up with an elegant, extensible way to tell Tableau “don’t take into account questions where people didn’t make any selections” we’re going to need a way to group related questions together.
Note that the technique I’m about to propose is different from the ad-hoc grouping approach I describe in Using Qualtrics’ Web Data Connector and Reshaping Survey Data with Tableau 9.0. The programmatic “group” we create here needs to be referenced in a calculated field, and you can’t do that with an ad-hoc group.
Here we’re going to take advantage of Qualtrics’ Question ID naming convention and how related questions all look like this:
That is, a group of related check-all-that-apply questions might be named like this:
And a group of related Likert scale questions might be named like this:
Given this standardization, we need a mechanism that will group everything to the left of the underscore. There are several ways we can achieve this, but for this example we’ll create a new field called “Question Prefix” and define it as follows:
Now let’s use this field and the Resp ID field to see who did not respond at all to a particular group of questions.
So, have a look at results in the image above where we place MAX(Value) on the text button. What’s happening is that for each respondent we’re looking at the answers to Q1_1, Q1_2, Q1_3, and Q1_4 and taking the maximum value across all those questions. Notice that there are four blanks / Nulls for 1002, 1006, 1011, and 1016 — these are the four respondents that didn’t answer the question group at all (see, I told you there would be a test.)
We do NOT want to include these people when we calculate the percentage of people that selected a particular choice, so we can place MAX(Value) on the Filters shelf and indicate that we should not include Null values, as shown below.
Applying this filter will show us responses from people that did NOT skip the question, as shown below.
Okay, this is great, but the visualization we want to create shouldn’t look anything like this. Indeed, we have too many things on the level of detail at this point.
So what are we going to do?
Level-of-Detail Calculations to the Rescue
Anytime you think “I need Tableau to consider some dimensions when making a calculation, but those dimensions can’t be in the visualization” a giant alarm bell should go off in your head — this is precisely the use case for a LoD calculation!
Here’s the LoD calc, that when placed on the Filters shelf, will do the trick:
A translation into English would be:
Even though [Question Prefix] and [Resp ID] may not be on the viz detail, take that combination of these two fields and find the MAX([Value]). Then find only the Null values from this and exclude them.
Note that we are using FIXED instead of INCLUDE as the former will automatically create a dimension and will work no matter what other filter settings may be in place.
Putting This Altogether
First we need to create a calculation that gives us the percentage of people that selected an option from the group of check-all-that-apply questions. For this instance the following calculation will work well.
A quick translation into English would be:
Add up everybody that answered “1” (meaning they selected an option) and divide by the number of people that answered the question.
Let’s see if this works. Consider the visualization shown below.
Does this pass the “smell” test? Look at Q1_4. You may recall that we saw that 12 out of 13 people selected this option — but 12 out of 13 is 92% not, 71%. Indeed, if we place SUM([Number of Records]) as a discrete measure on the rows shelf we see that Tableau believes 17 people responded to the question.
We need to put our LoD calc on the Filters shelf and set it to True.
How check-all-that-apply questions are coded is a problem with many survey tools including Qualtrics and Survey Gizmo. If you have a good data preparation tools like Alteryx you can fix the problem, but if you don’t have such a tool or don’t want to insert something into your workflow then you can get Tableau to address the problem with a Level-of-Detail calculation.