Jan 032018
 

January 3, 2018

Note: I first wrote about this five years ago and while the approaches I suggested then do in fact work, the advent of Level of Detail (LoD) expressions in Tableau gives us a much better way to get the job done. A very big “thank you” to my friend and colleague Jonathan Drummey who steered me very quickly to the flexible approach I write about below.

Overview

Those that have followed this blog know that when I setup survey data for analysis in Tableau I separate the so-called “demographic” questions (e.g., gender, ethnicity, education, political leanings, etc.) from the “what you want to know questions” (e.g., “would you recommend this company to a friend or colleague?”, “which of these things do you look for when considering an insurance carrier?”, etc.) The demographic questions remain as separate columns and the other questions get reshaped. So, you may start with 200 columns and 800 rows, with one row for each respondent, and you end up with 20 columns and tens of thousands of rows, with a separate row for each non-demographic question a respondent answered.

This is a solid, proven strategy, but suppose you want to filter / break down a survey question not by a demographic question, but by another “what you want to know” question?  That is, suppose you want to see how folks that selected “Yes” to the question “Do you plan to vote in the upcoming election” responded to a Net Promoter Score question?

Figure 1 -- A reshaped question acting as s "demographic" filter.

Figure 1 — A reshaped question acting as s “demographic” filter.

In this pair of blog posts we’ll show you how any reshaped question can be “promoted” to behave like a so-called “demographic” question. That is, we will come up with a flexible, parameter-based approach that will allow any reshaped question to become a Tableau dimension that acts as if it were in its own column from the get-go.

A few thoughts before we plow ahead.

  • I’ll be using the same data set I’ve used for most of the examples I’ve blogged about and I will have prepped the data as described here.
  • If you know which non-demographic questions warrant this treatment ahead of time you can certainly just copy them and make them separate columns before reshaping / pivoting, thus avoiding the techniques explained below.
  • You can also join a reshaped data source to itself but his will produce an overabundance of rows.
  • You can join the pivoted data to the unpivoted data and have tens of thousands of rows and hundreds of columns (but you and your audience will be miserable, and performance will be very slow).

A look inside Jonathan Drummey’s thought process

As I was working with Jonathan he jotted down his thoughts in the spreadsheet that I show below.

Figure 2 -- How Jonathan Drummey approached the problem.

Figure 2 — How Jonathan Drummey approached the problem.

Pay particular attention to his thoughts expressed in rows 9 through 18. Here’s a flowchart that Jonathan created that goes into more detail (in a slightly different order) about this process of identifying what kind of calculation is necessary:

Figure 3 -- Jonathan's flowchart for determining whether a Level-of-Detail expression or a Table Calculation is warranted.

Figure 3 — Jonathan’s flowchart for determining whether a Level-of-Detail expression or a Table Calculation is warranted.

Starting off easy: understanding our data

Before we come up with an extensible solution let’s start by turning a single question–in this case “Do you plan to vote in the upcoming election” (Question ID = Q0)–and make it behave like a dimension.  Here’s a mapping of all the Question IDs, how they group, and the universe of responses to each question.

Figure 4 -- Question groups, IDs, wording, and universe of possible numeric and text responses

Figure 4 — Question groups, IDs, wording, and universe of possible numeric and text responses

Now that we’ve examined all of our questions, let’s take look at the respondents who completed the survey and get a sense of who they are.

Figure 5 -- The demographics for  each survey respondent.

Figure 5 — The demographics for  each survey respondent.

Notice that we have separate columns for Gender, Generation, and Location.

So, how can we promote the “Do you plan to vote” question (Q0) from being a collection of rows into being its own dimension / column?

Plan to vote as a separate column

The LoD expression that will do what we want is this:

{FIXED [Resp ID]: MAX(IF [Question ID]="Q0" then [Labels] END)}

The way to interpret this is

Starting at the innermost part of the calculation, in the IF statement check each record and return the [Labels] value (i.e. “Yes”, “No”, or “Don’t Know” only if the [Question ID] is Q0, otherwise return a Null. Then, for each [Resp ID] in the data, return the maximum value of the results of the IF statements for that respondent, where MAX() will return “Yes”, “No”, “Don’t Know” or NULL if the respondent had never answered that question.

If you are wondering why you need the MAX() function it’s because you need to have some type of aggregation when using an LoD expression.  Note that MAX() or MIN() will both work as they will accept text as an argument, while SUM() and AVG() will not work.

You can now drag this newly-created field, named [Plan to vote] and use it as you would use any dimension, as shown below.

Figure 6 -- “Plan to vote” as a demographic dimension.

Figure 6 — “Plan to vote” as a demographic dimension.

Do note that there are some Null values as some respondents did not answer that question. You may want to alias these as “Did not respond”.

Making this flexible

This technique will work for elevating any single question to behave as a dimension and if you only have a handful of question you want to treat this way you need not read on.

But suppose you have dozens of not hundreds of questions that you want to explore as dimensions? You certainly won’t want to “hand chisel” hundreds of separate LoD expressions.

Instead, we should create a parameter-based solution that will allow users to select the question they want to “promote” from a drop-down menu.

You can read how to do this in Part 2.

 Posted by on January 3, 2018 2) Visualizing Survey Data, Blog Tagged with: , , ,  3 Responses »
Mar 082017
 

And… what did they choose?

March 8, 2017

Overview

I’ve discussed how to visualize check-all-that-apply questions in Tableau. Assuming your survey is coded as Yes = 1 and No = 0, you can fashion a sorted bar chart like this the one shown in Figure 1 using the following calculation.

SUM([Value]) / SUM(Number of Records)

The field [Value] would be 0 or 1 for each respondent that answered the question.

Figure 1 -- Visualizing a check-all-that-apply question

Figure 1 — Visualizing a check-all-that-apply question

I’ve also discussed how we can see break this down by various demographics (Gender, Location, Generation, etc.)

What I’ve not blogged about (until now) is how to answer the following questions:

  • How many people selected one item?
  • Two items?
  • Five items?
  • Of the people that only selected one item, what did they select?
  • Of the people that selected four items, what did they select?

Prior to the advent of LoD calculations this was doable, but a pretty big pain in the ass.

Fortunately, using examples that are “out in the wild” we can cobble together a compelling way to show the answers to these questions.

 

Visualizing How Many People Selected 1, 2, 3, N Items?

One of the best blog posts on Level-of-Detail expressions is Bethany Lyons’ Top 15 LoD Expressions.

It turns out the very first example discusses how figure out how many customers placed one order, how many placed two orders, etc.  This will give us exactly what we need to figure out how many people selected 1, 2, 3, N items in a check-all-that-apply question.

Here’s the calculation that will do the job.

Figure 2 -- The LoD calculation we'll need.

Figure 2 — The LoD calculation we’ll need.

This translates as “for the questions you are focusing on (and you better have your context filters happening so you are only looking at just the check-all-that-apply stuff), for each Resp ID, add up the values for all the questions people answered.”

Remember, the responses are 0s and 1s, so if somebody selected six things the SUM([Value]) would equal 6.

So, how do we use this?

The beautiful thing about using FIXED as our LoD keyword is that it allows us to turn the results into a dimension.  This means we can put How Many Selected on columns and CNTD(Resp ID) on rows and get a really useful histogram like the one shown in Figure 3.

Figure 3 -- Histogram showing number of respondents that selected 0 items, 1 item, 2 item, etc.

Figure 3 — Histogram showing number of respondents that selected 0 items, 1 item, 2 item, etc.

Notice the filter settings indicating that we only want responses to the check-all-that-apply questions. Further note that this filter has been added to the context which means we want Tableau to filter the results before computing the FIXED LoD calculation.

 

So, what did these People Select?

Okay, now we know how many people selected one item, two items, etc.

Just what did they select?

Because we set [How Many Selected] using the FIXED keyword we can use it like any other dimension.  That is, it will behave just like [Gender], [Location], and so on.

Borrowing from an existing technique (the visual ranking by category that I cited earlier) we can fashion a very useful dashboard that allows us to see some interesting nuances in the data.  For example, while Metabolism is ranked second overall with 70% of people selecting it, it ranked seventh among those that only selected one item (with only 4%), while 84% of people that selected four items selected it (Figure 4.)

Figure 4 -- Metabolism is ranked second overall with 70%, but only 4% of folks that chose one item selected it.

Figure 4 — Metabolism is ranked second overall with 70%, but only 4% of folks that chose one item selected it.

Similarly, check out the breakdown for Blood Pressure which is ranked third with 60% overall but is ranked first among folks that only measure one thing (Figure 5.)

Figure 5 -- Metabolism is ranked third overall with 60%, but was ranked first among those that only selected on item.

Figure 5 — Metabolism is ranked third overall with 60%, but was ranked first among those that only selected on item.

 

Other Useful Features of the Dashboard

The Marginal Histogram

The marginal histogram along the bottom of the chart shows you the breakdown of responses.

Figure 6 -- Marginal histogram shows distribution of responses

Figure 6 — Marginal histogram shows distribution of responses

Tool Tips Help Interpret the Findings

The ordinal numbers can be confusing as sometimes the number 2 means the number of items selected and other times it is the ranking.  Hovering over a bar explains how to interpret the results.

Figure 7 -- Tool tips help you interpret the results.

Figure 7 — Tool tips help you interpret the results.

Swap Among Different Dimensions

While this is first and foremost a blog post about showing how many people selected a certain number of items (and what they selected) it was very easy to add a parameter that allows you to swap among different dimensions.  In Figure 8 we see the break down by Location.

Figure 8 -- Use the Break Down by parameter to see rank and magnitude for the selected item among different dimensions, in this case Location.

Figure 8 — Use the Break Down by parameter to see rank and magnitude for the selected item among different dimensions, in this case Location.

Here’s the embedded workbook for you to try out and download.