Dealing with survey tools that don’t code check-all-that-apply questions correctly

Updated on September 11, 2018.

Note: Special thanks to Joe Mako who suggested this approach.

Overview

I’m fortunate that when I work with survey data that is not well-coded I can take out my trusty copy of Tableau Prep, EasyMoprh, or Alteryx Designer and get my data “just so.”

But what if you don’t have a copy of one of these data prep tools? Or what if you want to use Qualtrics’ or SurveyMonkey’s Web Data Connector and don’t have the ability to insert a data prep tool within your workflow?

In this blog post we’ll see how you can get Tableau to address the problem. Specifically, we’ll see how to work with check-all-that-apply questions where the data has been coded as 1s and blanks instead of 1s and 0s.

So, What’s Wrong with Blanks?

Consider the data snippet below that shows responses to a group of check-all-that-apply questions. This is what I consider to be badly-coded data.

Figure 1 — Check all that apply questions as they might appear in a CSV file downloaded from a commercial survey tool.

Now here’s how I would prefer the data to be coded.

So, why is this better?

By having 1s and 0s we can see, easily, if some respondents didn’t answer the question group at all. In the figure below, we can see that respondents 15 and 29 did not make any selection for any of the check-all-that-apply questions. More importantly, with data coded with 1s and 0s, Tableau can just filter out the Nulls and we’ll get accurate results.

Figure 3 — Respondents that didn’t answer this question group are highlighted in yellow

So, given our constraints, how are we going to come up with a way to distinguish between a blank that means a respondent didn’t select a particular option and a blank that, when combined with the other related blanks, means the person didn’t respond to the group of questions at all?

Before going any further, note that there’s no way we’re going to deal with the data in this wide format. That is, instead of the data looking like it does in the previous example, the data will instead look like this.

Figure 4 — A snippet of the data showing some responses for Resp ID 2. Note that we can see that this respondent selected “Yes” / “1” for Q2_9, Q2_3 and Q2_5.

If this format is foreign to you, stop reading this now and have a look at Getting Survey Data “Just So.”

So, with this tall format in mind, and the ability to easily indicate that you just want to look at a group of questions, let’s see how we can get a handle on who among the respondents selected something.

NOTE: Qualtrics does in fact have a setting that will allow you to download the data in a more Tableau-friendly format. Look for these settings for both the text and numeric downloads:

How Many Things Did Each Respondent Select?

Consider the visualization below which shows the SUM([Value)]) for each Resp ID where [Value] can be 1, 0, or blank (null). Note that we have a filter set so we only see the responses to the group of related check-all-that-apply questions.

Figure 5 — For each Resp ID, show the SUM([Value) for all the check-all-that-apply questions. We can see that respondent 2 selected 3 items, respondent 4 selected 5 items, and so on. If we were to scroll down we would see that many respondents didn’t select anything, meaning they did not answer the question group. Note that 845 people participated in this survey (see that last item highlighted in yellow).

We can filter the results so that we only see respondents that answered a question by dragging [Value] to the Filters shelf, selecting All Values, and indicating that we should show non-null values.

Figure 6 — Adding a filter so we don’t see null responses.

Note that when we do this we see that only 342 responded to this group of questions, as shown below.

Figure 7 — Adding a filter removes the respondents that didn’t select at least one item. We could also have placed SUM([Value)) on the Filters shelf and indicated that the value should be at least 1.

Okay, this is great, but the visualization we want to create shouldn’t look anything like this. Indeed, we have too many things on the level of detail at this point.

So what are we going to do?

Level-of-Detail Expressions to the Rescue

Anytime you think “I need Tableau to consider some dimensions when making a calculation, but those dimensions can’t be in the visualization” a giant alarm bell should go off in your head — this is precisely the use case for a LoD expression!

Here’s the LoD calc, that when placed on the Filters shelf, will do the trick:

Figure 8 — LoD calc that allows us to exclude folks that skipped answering a question group.

A translation into English would be:

Even though [Question Grouping] and [Resp ID] may not be on the viz detail, take that combination of these two fields and find the SUM([Value]). Then find only the Null values from this and exclude them by setting the filter to True.

Note that we are using FIXED instead of INCLUDE as the former will automatically create a dimension and will work no matter what other filter settings may be in place.

Putting this Altogether

First, we need to create a calculation that gives us the percentage of people that selected an option from the group of check-all-that-apply questions. For this instance, the following calculation will work well.

Figure 9 — How to determine the percentage of people that selected an option.

A quick translation into English would be:

Add up everybody that answered “1” (meaning they selected an option) and divide by the number of people that answered the question.

Let’s see if this works. Consider the visualization shown below.

Figure 10 — First attempt at showing the percentage of respondents that selected an option. This is NOT showing the correct percentage.

Does this pass the “smell” test? These percentages seem very low for a check-all-that-apply question that had nine things people could select. Indeed, if we place SUM([Number of Records]) as a discrete measure on the rows shelf we see that Tableau believes 845 people responded to the question, but we know only 342 responded.

Figure 11 — Without a filter Tableau believes 845 people responded to this group of questions.

We need to put our LoD calc on the Filters shelf and set it to True.

Figure 12— Removing “All of these are Null” from the respondent pool gives us the correct count of 342 and the correct percentage (77% instead of 31% for the first item.)

Conclusion

How check-all-that-apply questions are coded is a problem with many survey tools including Qualtrics and Survey Gizmo. If you have a good data preparation tools like Tableau Prep, EasyMorph, and Alteryx you can fix the problem, but if you don’t have such a tool or don’t want to insert something into your workflow then you can get Tableau to address the problem with a Level-of-Detail expression.