Jan 032018
 

January 3, 2018

In the first part we saw how to “hard code” the creation of a dimension from row-based survey data using a Level-of-Detail expression. In this second part we’ll see how we can turn any row-based survey data into a dimension “on the fly” using a parameter.

Where we left off

You may recall the calculation that would allow people to cut / filter by the question “do you plan to vote in the next election” was this:

{FIXED [Resp ID]: MAX(IF [Question ID]="Q0" then [Labels] END)}

So, how do we take this “one-off” calculation and make it flexible?

Making a flexible, extensible solution

One approach is to create a parameter called [Question ID Param] and seeding it with values from the Question ID field.  The resulting calculation would look like this:

{FIXED [Resp ID]: MAX(IF [Question ID]=[Question ID Param]
then [Labels] END)}

The problem is that your “friendly” parameter list will look like this:

Figure 1 -- Not-very-friendly question list.

Figure 1 — Not-very-friendly question list.

Quick! What does “Q34-SAT” stand for?

Maybe we should instead populate the parameter list from the human-readable form of the Question ID, in this case the [Wording] field. Here’s what that will look like.

Figure 2 -- Friendlier list, but still many problems.

Figure 2 — Friendlier list, but still many problems.

This is certainly an improvement, but there are two big problems with this.  The first is that there may be Question IDs that in fact have the same wording. Indeed, if your survey delves into comparing how important a list of features is with how satisfied people are with those features, you will need for the wording to be identical. For example, “Price” could refer to “How important is this to you?” or it could refer to “How satisfied are you with this?” (See this blog post for a discussion of visualizing importance vs. satisfaction.)

The second problem is that the questions are in alphabetical order, so you have three importance / satisfactions question, followed by three check-all-that-apply questions, followed by a handful of Likert-scale questions, followed by another importance / satisfaction question that should be with the first group, etc.

Combing Grouping with Wording

Jonathan Drummey came up with a very easy way to both disambiguate the satisfaction from the importance question and group the questions in the parameter list logically. The trick is to create a new field (we’ll call it [Question Parameter List]) that concatenates the question grouping and the question wording. We will define is as follows.

[Question Grouping] + ' / ' + [Wording]

This creates a list that looks like this.

Figure 3 -- Results of concatenating [Question Grouping] with [Wording].

Figure 3 — Results of concatenating [Question Grouping] with [Wording].

We’re almost done, we next need to create a new parameter, we’ll call it [Question to compare], and we’ll populate it with the members of the [Question Parameter List] field, as shown below.

Figure 4 -- Our friendly, disambiguated, logically-grouped list of questions.

Figure 4 — Our friendly, disambiguated, logically-grouped list of questions.

Armed with this concatenated list we can modify our hard-coded LoD expression field so that it looks like this.

{FIXED [Resp ID]: MAX(IF [Wording Parameter List]=
[Question to compare] then [Labels] END)}

Before you explore the downloadable dashboard at the end of this post I want to dissuade you from inflicting this “filter any question by any other question” functionality on your audience as you will simply be hitting them with too much flexibility. While I’m sure there are some insights to be gleaned from some of the question combinations, there are probably dozens, if not hundreds, that won’t yield anything useful. Do you really want to make your audience find where the good stuff is?

Cole Nussbaumer Knaflic presents a wonderful one-day workshop around her book, Storytelling with Data. In the workshop she states that finding good insights buried in a mound of data is like having to shuck a lot of oysters to find a pearl. Don’t show your audience all the oysters you shucked (and certainly don’t make them shuck the oysters!); just show them the pearl.

Yes, you should use this technique to find insights that go beyond cutting the data by traditional demographic questions.  And if / when you find something useful, limit what you show your audience to just those filters / options that provide insight.

One last thought: if you are building a “this-has-to-be-slick” dashboard — perhaps one that is customer-facing — consider ditching the single concatenated parameter and instead building a parent / child pair of parameters using Tableau’s Javascript API. The first parameter would show the question grouping and the second would show the question in human-readable form based on what was selected from the first parameter.

 Posted by on January 3, 2018 1) General Discussions, 2) Visualizing Survey Data, Blog Tagged with: , ,  1 Response »
Jan 032018
 

January 3, 2018

Note: I first wrote about this five years ago and while the approaches I suggested then do in fact work, the advent of Level of Detail (LoD) expressions in Tableau gives us a much better way to get the job done. A very big “thank you” to my friend and colleague Jonathan Drummey who steered me very quickly to the flexible approach I write about below.

Overview

Those that have followed this blog know that when I setup survey data for analysis in Tableau I separate the so-called “demographic” questions (e.g., gender, ethnicity, education, political leanings, etc.) from the “what you want to know questions” (e.g., “would you recommend this company to a friend or colleague?”, “which of these things do you look for when considering an insurance carrier?”, etc.) The demographic questions remain as separate columns and the other questions get reshaped. So, you may start with 200 columns and 800 rows, with one row for each respondent, and you end up with 20 columns and tens of thousands of rows, with a separate row for each non-demographic question a respondent answered.

This is a solid, proven strategy, but suppose you want to filter / break down a survey question not by a demographic question, but by another “what you want to know” question?  That is, suppose you want to see how folks that selected “Yes” to the question “Do you plan to vote in the upcoming election” responded to a Net Promoter Score question?

Figure 1 -- A reshaped question acting as s "demographic" filter.

Figure 1 — A reshaped question acting as s “demographic” filter.

In this pair of blog posts we’ll show you how any reshaped question can be “promoted” to behave like a so-called “demographic” question. That is, we will come up with a flexible, parameter-based approach that will allow any reshaped question to become a Tableau dimension that acts as if it were in its own column from the get-go.

A few thoughts before we plow ahead.

  • I’ll be using the same data set I’ve used for most of the examples I’ve blogged about and I will have prepped the data as described here.
  • If you know which non-demographic questions warrant this treatment ahead of time you can certainly just copy them and make them separate columns before reshaping / pivoting, thus avoiding the techniques explained below.
  • You can also join a reshaped data source to itself but his will produce an overabundance of rows.
  • You can join the pivoted data to the unpivoted data and have tens of thousands of rows and hundreds of columns (but you and your audience will be miserable, and performance will be very slow).

A look inside Jonathan Drummey’s thought process

As I was working with Jonathan he jotted down his thoughts in the spreadsheet that I show below.

Figure 2 -- How Jonathan Drummey approached the problem.

Figure 2 — How Jonathan Drummey approached the problem.

Pay particular attention to his thoughts expressed in rows 9 through 18. Here’s a flowchart that Jonathan created that goes into more detail (in a slightly different order) about this process of identifying what kind of calculation is necessary:

Figure 3 -- Jonathan's flowchart for determining whether a Level-of-Detail expression or a Table Calculation is warranted.

Figure 3 — Jonathan’s flowchart for determining whether a Level-of-Detail expression or a Table Calculation is warranted.

Starting off easy: understanding our data

Before we come up with an extensible solution let’s start by turning a single question–in this case “Do you plan to vote in the upcoming election” (Question ID = Q0)–and make it behave like a dimension.  Here’s a mapping of all the Question IDs, how they group, and the universe of responses to each question.

Figure 4 -- Question groups, IDs, wording, and universe of possible numeric and text responses

Figure 4 — Question groups, IDs, wording, and universe of possible numeric and text responses

Now that we’ve examined all of our questions, let’s take look at the respondents who completed the survey and get a sense of who they are.

Figure 5 -- The demographics for  each survey respondent.

Figure 5 — The demographics for  each survey respondent.

Notice that we have separate columns for Gender, Generation, and Location.

So, how can we promote the “Do you plan to vote” question (Q0) from being a collection of rows into being its own dimension / column?

Plan to vote as a separate column

The LoD expression that will do what we want is this:

{FIXED [Resp ID]: MAX(IF [Question ID]="Q0" then [Labels] END)}

The way to interpret this is

Starting at the innermost part of the calculation, in the IF statement check each record and return the [Labels] value (i.e. “Yes”, “No”, or “Don’t Know” only if the [Question ID] is Q0, otherwise return a Null. Then, for each [Resp ID] in the data, return the maximum value of the results of the IF statements for that respondent, where MAX() will return “Yes”, “No”, “Don’t Know” or NULL if the respondent had never answered that question.

If you are wondering why you need the MAX() function it’s because you need to have some type of aggregation when using an LoD expression.  Note that MAX() or MIN() will both work as they will accept text as an argument, while SUM() and AVG() will not work.

You can now drag this newly-created field, named [Plan to vote] and use it as you would use any dimension, as shown below.

Figure 6 -- “Plan to vote” as a demographic dimension.

Figure 6 — “Plan to vote” as a demographic dimension.

Do note that there are some Null values as some respondents did not answer that question. You may want to alias these as “Did not respond”.

Making this flexible

This technique will work for elevating any single question to behave as a dimension and if you only have a handful of question you want to treat this way you need not read on.

But suppose you have dozens of not hundreds of questions that you want to explore as dimensions? You certainly won’t want to “hand chisel” hundreds of separate LoD expressions.

Instead, we should create a parameter-based solution that will allow users to select the question they want to “promote” from a drop-down menu.

You can read how to do this in Part 2.

 Posted by on January 3, 2018 2) Visualizing Survey Data, Blog Tagged with: , , ,  3 Responses »
Dec 182017
 

December 18, 2017

Special thanks to Shaelyn McCole at Hootology who suggested the topic, Ryan Gensel at CPI who came up with a wonderful enhancement to the connected dot plot, and Jeffrey Shaffer at Data Plus Science who suggested some cosmetic improvements.

Overview

If you have filters in your dashboards you’ve probably had a thought similar to this: yes, it’s great that I can filter the results, but what did the dashboard look like before I applied the filter?  I can’t remember if the bars were smaller or larger, let alone by how much.

Consider the example below that shows the results to a multi-select survey question were we see results from all respondents that answered a set of questions.

Figure 1 -- Results from all survey respondents.

Figure 1 — Results from all survey respondents.

Now compare this with results with women from North America.

Figure 2 -- Results from female respondents who live in North America.

Figure 2 — Results from female respondents who live in North America.

Unless you process information and memories differently than the vast majority of people it’s very hard to compare the two populations and harder still if you can only see one result at a time (here you can at least scroll up and down to compare).

So, is there an easy way in Tableau to display both filtered and unfiltered results in the same visualization?

The answer is a resounding yes, with a refreshingly straightforward Level-of-Detail (LoD) calculation.

Determining the calculations

Let’s first look at the calculation we need to determine the percentage of people that checked “Yes” to a question. The “pill arrangement” in Tableau is shown below.

Figure 3 -- Pill arrangement for a check-all-that-apply survey question.

Figure 3 — Pill arrangement for a check-all-that-apply survey question.

Notice that we have Wording on the rows shelf. The Question Grouping filter is limiting the results to only those rows that have responses for the survey questions that interest us. Notice also the filters for Gender, Generation, and Location.

Here’s the calculation that determine the percentage of people that selected an item.

SUM([Vales]) / SUM([Number of Records])

As the survey has been coded so that folks that selected an item have a Value of 1 and those that did not select an item have a Value of 0, this calculation add up all the ones and divide by the number of people that answered the question.

Okay, so we have the calculation that will adjust the bars as we change the settings for the demographic filters.

What is the calculation that will keep the bars the same length even if we change the demographic filters?

All we need to do is lock the results to the Wording field using the following LoD calculation:

{FIXED [Wording]: SUM([Value])/SUM([Number of Records])}

This tells Tableau even though there may be filters, ignore them fix this to all the rows associated with the [Wording] field.  For those of you wondering, you could also fix this to the [Question ID] field and you would get the same results.

So, does this work?

Consider the pill configuration below where we can compare, one below the other, the results for the Filtered population and the Entire population. When all options in the demographic filers are checked, the bars are the same length.

Figure 4 -- Filtered vs. Unfiltered results.  Right now, with everything selected, the bars are the same length.

Figure 4 — Filtered vs. Unfiltered results.  Right now, with everything selected, the bars are the same length.

Now let’s see what happens when we apply some filters (keep you eyes on Adrenaline Production for the Entire Population, currently at 76%).

Figure 5 -- One set of bars changes but the other remains fixed

Figure 5 — One set of bars changes but the other remains fixed

Notice that one set of bars change, but the bars for the Entire Population (the ones using the LoD calculation) don’t change.

A gap chart… with a twist!

I experimented with several ways to visualize the differences and settled on a gap chart (also called a dumbbell chart, also called a connected dot plot). Let’s see how to build the chart (and how to improve it, using color coding that Ryan Gensel suggests).

Let’s start with the dots.

Figure 6 -- Dot plot using Measure Names and Measure Values

Figure 6 — Dot plot using Measure Names and Measure Values

This is very similar to the bar chart we had earlier but instead of bars we are using circles and we’ve placed Measure Names on Color so the circles are different colors.

We now need to create a line that connect the two dots on each row.

First, we will duplicate the Measure Values pill that is on columns and have two identical charts, side-by-side, as shown here.

Figure 7 -- Duplicating Measure Values

Figure 7 — Duplicating Measure Values

Next, for the second instance of Measure Values we will change the Mark type to Line and move Measure Names from Color to Path, as shown below.

Figure 8 -- Making the second instance of Measure Values a Line chart, connecting the two measures by placing Measure Names on Path

Figure 8 — Making the second instance of Measure Values a Line chart, connecting the two measures by placing Measure Names on Path

Now we’ll right-click the second Measure Values pill and select Dual Axis, then we’ll right-click the secondary axis and select Synchronize Axis, then we’ll right-click the secondary axis again and de-select Show Header, giving us a chart that looks like this.

09_NiceDumbbell

Making the difference stand out

I was chatting with CPI’s Ryan Gensel, one of he authors of the Agency Utilization Dashboard that appears in The Big Book of Dashboards.  Ryan told me about a technique he was using that colors the line based on which dot has the larger value.

Here I apply the technique by creating a field called Line Color which is defined as follows.

IF [Filtered -- % Check all that apply]>
SUM([Entire Population -- % Check all that apply]) THEN
"Filtered"
ELSE
"Entire Population"
END

We can then place this field on Color for the Line chart instance of Measure Values.  We will also need to make the size of the line a little thicker by clicking the Size button and moving the slider to the right. This yields a chart that looks like this.

Figure 10 -- A better gap chart.

Figure 10 — A better gap chart.

I’ve embedded the working dashboard below.  Please feel free download and explore the dashboard as well as some alternative approaches saved in the workbook.

 

 

Mar 082017
 

And… what did they choose?

March 8, 2017

Overview

I’ve discussed how to visualize check-all-that-apply questions in Tableau. Assuming your survey is coded as Yes = 1 and No = 0, you can fashion a sorted bar chart like this the one shown in Figure 1 using the following calculation.

SUM([Value]) / SUM(Number of Records)

The field [Value] would be 0 or 1 for each respondent that answered the question.

Figure 1 -- Visualizing a check-all-that-apply question

Figure 1 — Visualizing a check-all-that-apply question

I’ve also discussed how we can see break this down by various demographics (Gender, Location, Generation, etc.)

What I’ve not blogged about (until now) is how to answer the following questions:

  • How many people selected one item?
  • Two items?
  • Five items?
  • Of the people that only selected one item, what did they select?
  • Of the people that selected four items, what did they select?

Prior to the advent of LoD calculations this was doable, but a pretty big pain in the ass.

Fortunately, using examples that are “out in the wild” we can cobble together a compelling way to show the answers to these questions.

 

Visualizing How Many People Selected 1, 2, 3, N Items?

One of the best blog posts on Level-of-Detail expressions is Bethany Lyons’ Top 15 LoD Expressions.

It turns out the very first example discusses how figure out how many customers placed one order, how many placed two orders, etc.  This will give us exactly what we need to figure out how many people selected 1, 2, 3, N items in a check-all-that-apply question.

Here’s the calculation that will do the job.

Figure 2 -- The LoD calculation we'll need.

Figure 2 — The LoD calculation we’ll need.

This translates as “for the questions you are focusing on (and you better have your context filters happening so you are only looking at just the check-all-that-apply stuff), for each Resp ID, add up the values for all the questions people answered.”

Remember, the responses are 0s and 1s, so if somebody selected six things the SUM([Value]) would equal 6.

So, how do we use this?

The beautiful thing about using FIXED as our LoD keyword is that it allows us to turn the results into a dimension.  This means we can put How Many Selected on columns and CNTD(Resp ID) on rows and get a really useful histogram like the one shown in Figure 3.

Figure 3 -- Histogram showing number of respondents that selected 0 items, 1 item, 2 item, etc.

Figure 3 — Histogram showing number of respondents that selected 0 items, 1 item, 2 item, etc.

Notice the filter settings indicating that we only want responses to the check-all-that-apply questions. Further note that this filter has been added to the context which means we want Tableau to filter the results before computing the FIXED LoD calculation.

 

So, what did these People Select?

Okay, now we know how many people selected one item, two items, etc.

Just what did they select?

Because we set [How Many Selected] using the FIXED keyword we can use it like any other dimension.  That is, it will behave just like [Gender], [Location], and so on.

Borrowing from an existing technique (the visual ranking by category that I cited earlier) we can fashion a very useful dashboard that allows us to see some interesting nuances in the data.  For example, while Metabolism is ranked second overall with 70% of people selecting it, it ranked seventh among those that only selected one item (with only 4%), while 84% of people that selected four items selected it (Figure 4.)

Figure 4 -- Metabolism is ranked second overall with 70%, but only 4% of folks that chose one item selected it.

Figure 4 — Metabolism is ranked second overall with 70%, but only 4% of folks that chose one item selected it.

Similarly, check out the breakdown for Blood Pressure which is ranked third with 60% overall but is ranked first among folks that only measure one thing (Figure 5.)

Figure 5 -- Metabolism is ranked third overall with 60%, but was ranked first among those that only selected on item.

Figure 5 — Metabolism is ranked third overall with 60%, but was ranked first among those that only selected on item.

 

Other Useful Features of the Dashboard

The Marginal Histogram

The marginal histogram along the bottom of the chart shows you the breakdown of responses.

Figure 6 -- Marginal histogram shows distribution of responses

Figure 6 — Marginal histogram shows distribution of responses

Tool Tips Help Interpret the Findings

The ordinal numbers can be confusing as sometimes the number 2 means the number of items selected and other times it is the ranking.  Hovering over a bar explains how to interpret the results.

Figure 7 -- Tool tips help you interpret the results.

Figure 7 — Tool tips help you interpret the results.

Swap Among Different Dimensions

While this is first and foremost a blog post about showing how many people selected a certain number of items (and what they selected) it was very easy to add a parameter that allows you to swap among different dimensions.  In Figure 8 we see the break down by Location.

Figure 8 -- Use the Break Down by parameter to see rank and magnitude for the selected item among different dimensions, in this case Location.

Figure 8 — Use the Break Down by parameter to see rank and magnitude for the selected item among different dimensions, in this case Location.

Here’s the embedded workbook for you to try out and download.