Jan 032018
 

January 3, 2018

In the first part we saw how to “hard code” the creation of a dimension from row-based survey data using a Level-of-Detail expression. In this second part we’ll see how we can turn any row-based survey data into a dimension “on the fly” using a parameter.

Where we left off

You may recall the calculation that would allow people to cut / filter by the question “do you plan to vote in the next election” was this:

{FIXED [Resp ID]: MAX(IF [Question ID]="Q0" then [Labels] END)}

So, how do we take this “one-off” calculation and make it flexible?

Making a flexible, extensible solution

One approach is to create a parameter called [Question ID Param] and seeding it with values from the Question ID field.  The resulting calculation would look like this:

{FIXED [Resp ID]: MAX(IF [Question ID]=[Question ID Param]
then [Labels] END)}

The problem is that your “friendly” parameter list will look like this:

Figure 1 -- Not-very-friendly question list.

Figure 1 — Not-very-friendly question list.

Quick! What does “Q34-SAT” stand for?

Maybe we should instead populate the parameter list from the human-readable form of the Question ID, in this case the [Wording] field. Here’s what that will look like.

Figure 2 -- Friendlier list, but still many problems.

Figure 2 — Friendlier list, but still many problems.

This is certainly an improvement, but there are two big problems with this.  The first is that there may be Question IDs that in fact have the same wording. Indeed, if your survey delves into comparing how important a list of features is with how satisfied people are with those features, you will need for the wording to be identical. For example, “Price” could refer to “How important is this to you?” or it could refer to “How satisfied are you with this?” (See this blog post for a discussion of visualizing importance vs. satisfaction.)

The second problem is that the questions are in alphabetical order, so you have three importance / satisfactions question, followed by three check-all-that-apply questions, followed by a handful of Likert-scale questions, followed by another importance / satisfaction question that should be with the first group, etc.

Combing Grouping with Wording

Jonathan Drummey came up with a very easy way to both disambiguate the satisfaction from the importance question and group the questions in the parameter list logically. The trick is to create a new field (we’ll call it [Question Parameter List]) that concatenates the question grouping and the question wording. We will define is as follows.

[Question Grouping] + ' / ' + [Wording]

This creates a list that looks like this.

Figure 3 -- Results of concatenating [Question Grouping] with [Wording].

Figure 3 — Results of concatenating [Question Grouping] with [Wording].

We’re almost done, we next need to create a new parameter, we’ll call it [Question to compare], and we’ll populate it with the members of the [Question Parameter List] field, as shown below.

Figure 4 -- Our friendly, disambiguated, logically-grouped list of questions.

Figure 4 — Our friendly, disambiguated, logically-grouped list of questions.

Armed with this concatenated list we can modify our hard-coded LoD expression field so that it looks like this.

{FIXED [Resp ID]: MAX(IF [Wording Parameter List]=
[Question to compare] then [Labels] END)}

Before you explore the downloadable dashboard at the end of this post I want to dissuade you from inflicting this “filter any question by any other question” functionality on your audience as you will simply be hitting them with too much flexibility. While I’m sure there are some insights to be gleaned from some of the question combinations, there are probably dozens, if not hundreds, that won’t yield anything useful. Do you really want to make your audience find where the good stuff is?

Cole Nussbaumer Knaflic presents a wonderful one-day workshop around her book, Storytelling with Data. In the workshop she states that finding good insights buried in a mound of data is like having to shuck a lot of oysters to find a pearl. Don’t show your audience all the oysters you shucked (and certainly don’t make them shuck the oysters!); just show them the pearl.

Yes, you should use this technique to find insights that go beyond cutting the data by traditional demographic questions.  And if / when you find something useful, limit what you show your audience to just those filters / options that provide insight.

One last thought: if you are building a “this-has-to-be-slick” dashboard — perhaps one that is customer-facing — consider ditching the single concatenated parameter and instead building a parent / child pair of parameters using Tableau’s Javascript API. The first parameter would show the question grouping and the second would show the question in human-readable form based on what was selected from the first parameter.

 Posted by on January 3, 2018 1) General Discussions, 2) Visualizing Survey Data, Blog Tagged with: , ,  1 Response »
Jan 032018
 

January 3, 2018

Note: I first wrote about this five years ago and while the approaches I suggested then do in fact work, the advent of Level of Detail (LoD) expressions in Tableau gives us a much better way to get the job done. A very big “thank you” to my friend and colleague Jonathan Drummey who steered me very quickly to the flexible approach I write about below.

Overview

Those that have followed this blog know that when I setup survey data for analysis in Tableau I separate the so-called “demographic” questions (e.g., gender, ethnicity, education, political leanings, etc.) from the “what you want to know questions” (e.g., “would you recommend this company to a friend or colleague?”, “which of these things do you look for when considering an insurance carrier?”, etc.) The demographic questions remain as separate columns and the other questions get reshaped. So, you may start with 200 columns and 800 rows, with one row for each respondent, and you end up with 20 columns and tens of thousands of rows, with a separate row for each non-demographic question a respondent answered.

This is a solid, proven strategy, but suppose you want to filter / break down a survey question not by a demographic question, but by another “what you want to know” question?  That is, suppose you want to see how folks that selected “Yes” to the question “Do you plan to vote in the upcoming election” responded to a Net Promoter Score question?

Figure 1 -- A reshaped question acting as s "demographic" filter.

Figure 1 — A reshaped question acting as s “demographic” filter.

In this pair of blog posts we’ll show you how any reshaped question can be “promoted” to behave like a so-called “demographic” question. That is, we will come up with a flexible, parameter-based approach that will allow any reshaped question to become a Tableau dimension that acts as if it were in its own column from the get-go.

A few thoughts before we plow ahead.

  • I’ll be using the same data set I’ve used for most of the examples I’ve blogged about and I will have prepped the data as described here.
  • If you know which non-demographic questions warrant this treatment ahead of time you can certainly just copy them and make them separate columns before reshaping / pivoting, thus avoiding the techniques explained below.
  • You can also join a reshaped data source to itself but his will produce an overabundance of rows.
  • You can join the pivoted data to the unpivoted data and have tens of thousands of rows and hundreds of columns (but you and your audience will be miserable, and performance will be very slow).

A look inside Jonathan Drummey’s thought process

As I was working with Jonathan he jotted down his thoughts in the spreadsheet that I show below.

Figure 2 -- How Jonathan Drummey approached the problem.

Figure 2 — How Jonathan Drummey approached the problem.

Pay particular attention to his thoughts expressed in rows 9 through 18. Here’s a flowchart that Jonathan created that goes into more detail (in a slightly different order) about this process of identifying what kind of calculation is necessary:

Figure 3 -- Jonathan's flowchart for determining whether a Level-of-Detail expression or a Table Calculation is warranted.

Figure 3 — Jonathan’s flowchart for determining whether a Level-of-Detail expression or a Table Calculation is warranted.

Starting off easy: understanding our data

Before we come up with an extensible solution let’s start by turning a single question–in this case “Do you plan to vote in the upcoming election” (Question ID = Q0)–and make it behave like a dimension.  Here’s a mapping of all the Question IDs, how they group, and the universe of responses to each question.

Figure 4 -- Question groups, IDs, wording, and universe of possible numeric and text responses

Figure 4 — Question groups, IDs, wording, and universe of possible numeric and text responses

Now that we’ve examined all of our questions, let’s take look at the respondents who completed the survey and get a sense of who they are.

Figure 5 -- The demographics for  each survey respondent.

Figure 5 — The demographics for  each survey respondent.

Notice that we have separate columns for Gender, Generation, and Location.

So, how can we promote the “Do you plan to vote” question (Q0) from being a collection of rows into being its own dimension / column?

Plan to vote as a separate column

The LoD expression that will do what we want is this:

{FIXED [Resp ID]: MAX(IF [Question ID]="Q0" then [Labels] END)}

The way to interpret this is

Starting at the innermost part of the calculation, in the IF statement check each record and return the [Labels] value (i.e. “Yes”, “No”, or “Don’t Know” only if the [Question ID] is Q0, otherwise return a Null. Then, for each [Resp ID] in the data, return the maximum value of the results of the IF statements for that respondent, where MAX() will return “Yes”, “No”, “Don’t Know” or NULL if the respondent had never answered that question.

If you are wondering why you need the MAX() function it’s because you need to have some type of aggregation when using an LoD expression.  Note that MAX() or MIN() will both work as they will accept text as an argument, while SUM() and AVG() will not work.

You can now drag this newly-created field, named [Plan to vote] and use it as you would use any dimension, as shown below.

Figure 6 -- “Plan to vote” as a demographic dimension.

Figure 6 — “Plan to vote” as a demographic dimension.

Do note that there are some Null values as some respondents did not answer that question. You may want to alias these as “Did not respond”.

Making this flexible

This technique will work for elevating any single question to behave as a dimension and if you only have a handful of question you want to treat this way you need not read on.

But suppose you have dozens of not hundreds of questions that you want to explore as dimensions? You certainly won’t want to “hand chisel” hundreds of separate LoD expressions.

Instead, we should create a parameter-based solution that will allow users to select the question they want to “promote” from a drop-down menu.

You can read how to do this in Part 2.

 Posted by on January 3, 2018 2) Visualizing Survey Data, Blog Tagged with: , , ,  3 Responses »
Dec 182017
 

December 18, 2017

Special thanks to Shaelyn McCole at Hootology who suggested the topic, Ryan Gensel at CPI who came up with a wonderful enhancement to the connected dot plot, and Jeffrey Shaffer at Data Plus Science who suggested some cosmetic improvements.

Overview

If you have filters in your dashboards you’ve probably had a thought similar to this: yes, it’s great that I can filter the results, but what did the dashboard look like before I applied the filter?  I can’t remember if the bars were smaller or larger, let alone by how much.

Consider the example below that shows the results to a multi-select survey question were we see results from all respondents that answered a set of questions.

Figure 1 -- Results from all survey respondents.

Figure 1 — Results from all survey respondents.

Now compare this with results with women from North America.

Figure 2 -- Results from female respondents who live in North America.

Figure 2 — Results from female respondents who live in North America.

Unless you process information and memories differently than the vast majority of people it’s very hard to compare the two populations and harder still if you can only see one result at a time (here you can at least scroll up and down to compare).

So, is there an easy way in Tableau to display both filtered and unfiltered results in the same visualization?

The answer is a resounding yes, with a refreshingly straightforward Level-of-Detail (LoD) calculation.

Determining the calculations

Let’s first look at the calculation we need to determine the percentage of people that checked “Yes” to a question. The “pill arrangement” in Tableau is shown below.

Figure 3 -- Pill arrangement for a check-all-that-apply survey question.

Figure 3 — Pill arrangement for a check-all-that-apply survey question.

Notice that we have Wording on the rows shelf. The Question Grouping filter is limiting the results to only those rows that have responses for the survey questions that interest us. Notice also the filters for Gender, Generation, and Location.

Here’s the calculation that determine the percentage of people that selected an item.

SUM([Vales]) / SUM([Number of Records])

As the survey has been coded so that folks that selected an item have a Value of 1 and those that did not select an item have a Value of 0, this calculation add up all the ones and divide by the number of people that answered the question.

Okay, so we have the calculation that will adjust the bars as we change the settings for the demographic filters.

What is the calculation that will keep the bars the same length even if we change the demographic filters?

All we need to do is lock the results to the Wording field using the following LoD calculation:

{FIXED [Wording]: SUM([Value])/SUM([Number of Records])}

This tells Tableau even though there may be filters, ignore them fix this to all the rows associated with the [Wording] field.  For those of you wondering, you could also fix this to the [Question ID] field and you would get the same results.

So, does this work?

Consider the pill configuration below where we can compare, one below the other, the results for the Filtered population and the Entire population. When all options in the demographic filers are checked, the bars are the same length.

Figure 4 -- Filtered vs. Unfiltered results.  Right now, with everything selected, the bars are the same length.

Figure 4 — Filtered vs. Unfiltered results.  Right now, with everything selected, the bars are the same length.

Now let’s see what happens when we apply some filters (keep you eyes on Adrenaline Production for the Entire Population, currently at 76%).

Figure 5 -- One set of bars changes but the other remains fixed

Figure 5 — One set of bars changes but the other remains fixed

Notice that one set of bars change, but the bars for the Entire Population (the ones using the LoD calculation) don’t change.

A gap chart… with a twist!

I experimented with several ways to visualize the differences and settled on a gap chart (also called a dumbbell chart, also called a connected dot plot). Let’s see how to build the chart (and how to improve it, using color coding that Ryan Gensel suggests).

Let’s start with the dots.

Figure 6 -- Dot plot using Measure Names and Measure Values

Figure 6 — Dot plot using Measure Names and Measure Values

This is very similar to the bar chart we had earlier but instead of bars we are using circles and we’ve placed Measure Names on Color so the circles are different colors.

We now need to create a line that connect the two dots on each row.

First, we will duplicate the Measure Values pill that is on columns and have two identical charts, side-by-side, as shown here.

Figure 7 -- Duplicating Measure Values

Figure 7 — Duplicating Measure Values

Next, for the second instance of Measure Values we will change the Mark type to Line and move Measure Names from Color to Path, as shown below.

Figure 8 -- Making the second instance of Measure Values a Line chart, connecting the two measures by placing Measure Names on Path

Figure 8 — Making the second instance of Measure Values a Line chart, connecting the two measures by placing Measure Names on Path

Now we’ll right-click the second Measure Values pill and select Dual Axis, then we’ll right-click the secondary axis and select Synchronize Axis, then we’ll right-click the secondary axis again and de-select Show Header, giving us a chart that looks like this.

09_NiceDumbbell

Making the difference stand out

I was chatting with CPI’s Ryan Gensel, one of he authors of the Agency Utilization Dashboard that appears in The Big Book of Dashboards.  Ryan told me about a technique he was using that colors the line based on which dot has the larger value.

Here I apply the technique by creating a field called Line Color which is defined as follows.

IF [Filtered -- % Check all that apply]>
SUM([Entire Population -- % Check all that apply]) THEN
"Filtered"
ELSE
"Entire Population"
END

We can then place this field on Color for the Line chart instance of Measure Values.  We will also need to make the size of the line a little thicker by clicking the Size button and moving the slider to the right. This yields a chart that looks like this.

Figure 10 -- A better gap chart.

Figure 10 — A better gap chart.

I’ve embedded the working dashboard below.  Please feel free download and explore the dashboard as well as some alternative approaches saved in the workbook.

 

 

Oct 252017
 

Overview

Tableau’s automatic axis feature can often present problems with survey data.

In this blog post we’ll look at how the problem crops up and what you can do to both fix the problem and make your visualizations more insightful.

The problem with auto-adjusting bars

Tableau’s automatic axis does a great job extending and contracting the length of bars so that a bar chart fills the view. In most circumstances this is a good thing but there are cases where it can inhibit your audience’s ability to correctly interpret your data.

Consider the chart below where we see the percentage of people that selected that top 2 boxes for a Likert scale question (in this case we see the percentage of people who selected Agree or Strongly Agree).  Pay attention to the length of the top bar.

Figure 1 -- Percentage of respondents that selected Agree or Strongly Agree (Top 2 Boxes).  Notice the length of the top bar.

Figure 1 — Percentage of respondents that selected Agree or Strongly Agree (Top 2 Boxes).  Notice the length of the top bar.

Now let’s see what happens when we change the view to show the percentage of people that only selected Strongly Agree (Top Box).  Again, pay attention to the length of the top bar.

Figure 2 -- Percentage of respondents that selected Strongly Agree (Top Box).  Again, notice the length of the top bar.

Figure 2 — Percentage of respondents that selected Strongly Agree (Top Box).  Again, notice the length of the top bar.

There’s a HUGE difference here, but unless you look at the numbers (and the labels, as the sorting has changed) you’re going to miss it.

Make the differences pop by comparing with 100%

Here’s the same data but this time we’re showing the gap between each bar and 100%.

First, let’s see what the Top 2 Box view looks like.

Figure 3 -- Percentage of respondents that selected Agree or Strongly Agree (Top 2 Boxes) compared with 100%.

Figure 3 — Percentage of respondents that selected Agree or Strongly Agree (Top 2 Boxes) compared with 100%.

Now let’s see what happens when we change this to show only the Top Box.

Figure 4 -- Percentage of respondents that selected Strongly Agree (Top Box) compared with 100%.

Figure 4 — Percentage of respondents that selected Strongly Agree (Top Box) compared with 100%.

As chef Emeril Lagasse might say “BAM!”

I’ll walk through how to create the 100% bars in a moment but I first want to show another instance where the auto-sizing bars can present a problem.

Showing demographics: comparing two similar charts

A similar issues comes into play with demographics dashboards.  Consider the very simple dashboard below that shows the percentage of respondents broken down by location and gender.

Figure 5 -- Respondent demographics dashboard with auto-sizing bars.

Figure 5 — Respondent demographics dashboard with auto-sizing bars.

Do you see the problem?  No?  Check out what happens when you hover over the North America bar and the Male bar.

Figure 6 -- Two independent charts with slightly different levels of magnitude.

Figure 6 — Two independent charts with slightly different levels of magnitude.

The top bar in both cases is the same length, but in the top chart is represents 341 respondents (40%) and in the bottom chart it represents 436 people (52%).

The solution is to normalize the chart so that both have a common axis, as shown below.

Figure 7 -- Demographics dashboard with common (normalized) axes.

Figure 7 — Demographics dashboard with common (normalized) axes.

We will consider a couple of ways to produce this normalized axis later but first let’s tackle the 100% reference bars.

Building the reference bars visualization

Let’s start with the view shown below which shows the percentage of folks that selected the Top N Boxes.  Note that the completed dashboards may be found embedded at the end of this blog post.

Figure 8 — Initial view

  1. Create a calculated field called 100 Percent and define it as follows.
    09_100
    Yes, you will need to define at as “1.0” so that Tableau treats it as a float and not an integer. This will come up later when we need to create a dual axis and synchronize the charts.
  2. Right-drag the newly-created field to Columns and select MAX(100 Percent) from the Drop Field dialog Box. Also make sure the mark is set to Bar and not automatic.
    10_RightDrag
  3. Right-click MAX(100 Percent) on columns and select Dual Axis. Your screen should look like the one shown below.
    11_DualEearly
  4. Right-click the axis at the top and select Move marks to back, then right-click the axis again and select Synchronize Axis. Your screen should look like this.
    12_AlmostThere
  5. Right-click the top axis and deselect Show Header, then modify the colors so that the 100% bars is a muted gray.

Normalizing the axis in the demographics dashboard

I’m going to recommend a different approach for the demographics dashboard as we really don’t need to show 100%; it’s very unlikely that any one segment will ever reach 100%.  Instead I propose adding a hidden reference line placed at the maximum value that any demographic element is likely to reach, say 60%

Consider the Location demographic visualization below.

13_Location

There are several ways to hard-code a reference line at 60%. Here’s one way to go.

  1. Right-click the axis along the bottom and select Add Reference Line.
  2. When the Add Reference Line, Band, or Box dialog box appears, select Entire Table and indicate you want a constant set for .6. Make sure to set Label to None and Line to None as shown below.
    14_RefLine
  3. Repeat this for all the other worksheets that comprise your demographics dashboard.

“Wait!”, you may exclaim. “Why not just fix the axis at 60%?”

That will work… most of the time.  But what happens if you fix the axis at 60% and you in fact have a bar that exceeds that value?  If you fix the axis you are out of luck, but a hidden reference line will not constrain the axis; that is, it will make sure the axis extends to at least 60%, but the axis will go beyond that if needed.

Conclusion

Tableau’s tendency to auto-size an axis is great but can sometimes foster confusion. In this post we explored two different techniques that will help your audience make more accurate comparisons.

Sep 202017
 

September 20, 2017

Overview

As anyone who has read anything on my blog that relates to survey data knows, the number one impediment to success with Tableau is getting your data “just so.”

Until recently I recommended two different approaches. You can either use Alteryx and have a rock solid, robust, fully-featured, (and expensive) solution or you could have a somewhat rickety, cumbersome, free solution using cross-database joins in Tableau.

Thanks to the advice of fellow Zen Master Rob Radburn, I now know about a third alternative, EasyMorph. Based on my brief time working with the product, EasyMorph appears to be solid, robust, full-featured, and costs somewhere between free and not-at-all expensive.

In this blog, I’ll show you how to take the same data set I use in all my classes and get it setup so that it works perfectly with Tableau.

Working with EasyMorph

Using the same data set I discuss in the general “here’s how your data needs to be setup” blog post, let’s see how we’re going to get the data-as-numbers, the data-as-labels, and the helper file to all play nicely in EasyMorph.

Importing and Reshaping the Label Responses

Here’s the EasyMorph Start page. Let’s begin by importing the Label responses from Excel.

Figure 1 -- EasyMorph Start page. Yes, we're importing Excel files but we can also connect to databases.

Figure 1 — EasyMorph Start page. Yes, we’re importing Excel files but we can also connect to databases.

Here I indicate that I want to open the Excel workbook and then tell EasyMorph which tab in the workbook I want to use.

Figure 2 -- Specifying which sheet in the Excel workbook I want to use, in this case Data Labels.

Figure 2 — Specifying which sheet in the Excel workbook I want to use, in this case Data Labels.

Clicking the Apply button will apply this “transformation”, where a transformation is just an action I want to have happen in my workflow. EasyMorph will then display the data in its main workspace window, as shown below.

Figure 3 -- The Excel Data Labels tab imported into EasyMorph.

Figure 3 — The Excel Data Labels tab imported into EasyMorph.

We now need to specify which columns we want to leave intact and which we want to reshape.  We can do this by adding an Unpivot transformation.

I’ll start by clicking Add New Transformation

Figure 4 -- Adding a new transformation

Figure 4 — Adding a new transformation

… then I’ll click the Advanced menu and select Unpivot from the list, as shown here.

Figure 5 -- Not the most obvious place to find this feature, but this is how you reshape the data (Tableau calls this a pivot; EasyMorph calls it an unpivot.  Tomato Tomahto...)

Figure 5 — Not the most obvious place to find this feature, but this is how you reshape the data (Tableau calls this a pivot; EasyMorph calls it an unpivot.  Tomato Tomahto…)

We next need to indicate what we want to call the reshaped fields and which fields we want to leave intact.  Here we indicate that we want to leave Gender Generation, and Location intact, but we also need to include RespID and Weight (we need to scroll down to see them.)

Figure 6 -- Specify which fields to keep and which to reshape. The two other fields are at the bottom of the list.

Figure 6 — Specify which fields to keep and which to reshape. Note that the two other fields we need to select are at the bottom of the list.

Next, we need only click the Apply button and we’ve got our demographics and Data Labels set up, “just so.”

Figure 7 -- Our reshaped data.  We've applied two transformations.  Notice at the bottom of the screen we see that we went from 45 columns and 845 rows to 7 columns and 33,000 rows.

Figure 7 — Our reshaped data.  We’ve applied two transformations.  Notice at the bottom of the screen we see that we went from 45 columns and 845 rows to 7 columns and 33,000 rows.

Importing and Reshaping the Numeric Responses

We’re now ready to import the Numeric version of the data. We start by clicking Import/create table, indicate we want to import from a file, and select the same Excel file.

Figure 8 -- Inserting a second set of data.

Figure 8 — Inserting a second set of data.

We’ll see the same options as before, but this time we’ll specify that we want to import the Data Numbers tab, as shown below.

Figure 9 -- Importing the numeric version of the data.

Figure 9 — Importing the numeric version of the data.

Clicking Apply will import the data.

Before reshaping the numeric version of the data, we can remove redundant columns, in this case the “demographic” variables (Gender, Location, Generation, and Weight) as we already have them in the Label version of the data.

We can do this by selecting the columns and selecting Remove 4 columns from the context menu, as shown here.  Note that this context menu approach is just a different way to create and apply a transformation.

Figure 10 -- We don't need to have these columns as they are already present in the other data set.

Figure 10 — We don’t need to have these columns as they are already present in the other data set.

We’ll now Unpivot the data, leaving RespID and renaming the columns, as shown below.

Figure 11 -- Reshaping the numeric version of the data.

Figure 11 — Reshaping the numeric version of the data.

Joining the Two Reshaped Data Sources

We’re now ready to join the two reshaped data sources. We’ll need to make sure that every Quesiton ID for each RespID lines up so we will join the tables on both of these fields.

Start by clicking the first table (Imported Table 1), then select Add new transformation, then select Merge another table from the list of transformations, as shown below.

Figure 12 -- This is how you do a Join in EasyMorph.

Figure 12 — This is how you do a Join in EasyMorph.

By default, EasyMorph suggests matching the data based on RespID and Question ID. Note that I only need to bring in the Values field from the second table, as shown here.

Figure 13 -- Selecting which fields to join and which columns to combine from the second data source.

Figure 13 — Selecting which fields to join and which columns to combine from the second data source.

Clicking Apply will append the Values column from the second data source to the first data source.

Adding the Helper data to the mix

We’re now ready to import and join the “Helper” data. This will map each Question ID to its human readable form and groups related questions together.

Start by clicking Import/create table, then indicate that you want to import from a file, then select the same Excel file.

We’ll see the same options as before, but this time we’ll specify that we want to import the Question Helper sheet, as shown below.

Figure 14 -- Importing the Helper data.

Figure 14 — Importing the Helper data.

We’re now ready to merge the Helper data with the combined numeric and label data.

We start first by selecting the main table (Imported Table 1) and selecting Add new transformation.

We then select Merge another table and indicate we want to add columns from Imported table 3, as shown below.

Figure 15 -- Selecting which secondary table to merge.

Figure 15 — Selecting which secondary table to merge.

We now need to specify which fields to use for the join and which columns to combine. Note that the field names are different so EasyMorph didn’t automatically join on Question ID.

Figure 16 -- Specifying the Join field and which columns to combine. Here we join “Question ID” with “QuestionID.”  Hey, who says I don’t show “real world” examples?

Figure 16 — Specifying the Join field and which columns to combine. Here we join “Question ID” with “QuestionID.”  Hey, who says I don’t show “real world” examples?

Clicking Apply will combine all the relevant data into the main table, as shown below.

Figure 17 -- All the data, combined correctly.

Figure 17 — All the data, combined correctly.

Notice the icons along the top of the window indicating that we’ve performed four transformations on this table. Selecting an icon will show the settings you specified for the selected transformation.

Not convinced the data is “just so”? We can sort on RespID and you’ll see all the survey responses for a particular respondent (Just right-click the RespID column and select Sort.)

Exporting the Data to a Tableau Data Extract file (.TDE file)

We’re now ready to export the data so that we can visualize it using Tableau.

We’ll start by making sure the main table is selected and then click Add new transformation.

We then select Export and select Export to Tableau from the list of options, as shown below.

Figure 18 -- Exporting the data to a .TDE file.

Figure 18 — Exporting the data to a .TDE file.

Clicking Export to Tableau will bring up the Export options, as shown below.

Figure 19 -- Exporting to a .TDE file. Note that you can write the file directly to Tableau Server.

Figure 19 — Exporting to a .TDE file. Note that you can write the file directly to Tableau Server.

Before clicking Apply, indicate that RespID and Labels should be Text fields and not Number fields.

Okay, we’re just about finished; we just need to run the project by clicking the Run project button at the top of the screen.

Want to load the exported .TDE into Tableau? Just click the last transformation icon at the top of the screen, then click the Open file icon, as shown below.

Figure 20 -- You can load the exported data into Tableau by clicking the Open file icon.

Figure 20 — You can load the exported data into Tableau by clicking the Open file icon.

Below we see what complete data flow looks like. Note that I added a transformation to filter out null values (It’s the second to last icon in the top table). Further note that you can rename each of the tables as you see fit.

Figure 21 -- The complete data flow.

Figure 21 — The complete data flow.

But wait, there’s more!

Assuming your data is coded correctly you’ll probably want to remove null values and labels. No problem, you can just Add a transformation and indicate you want to remove <empty> values.

And what if your data isn’t coded correctly (e.g., check all apply questions are coded using 1s and blanks instead of 1s and 0s)?  EasyMorph can handle that, too.

Conclusion

Easy morph was very easy to learn and you can’t beat the price. If you have 30 or fewer transformations in a project you can use the free download (our sample only has ten transformations). Need more than 30 transformations as well as more sophisticated features (e.g., automation and scripting) the price will range from a one-time license of $195 to $55 per month.

EasyMorph also appears to handle large survey data sources without a problem. I was able to convert a monster NPS survey with 3,500 columns and 2,200 rows into a trim ten columns and about 7M rows in just a matter of seconds.

So far, the only downside I’ve seen is that it won’t read SPSS files directly; you’ll need to export to two CSV files, one for numeric-coded data and one for label-coded data.

Other than that minor issue the tool has been terrific and I plan to incorporate it into my classes on visualizing survey data using Tableau.

If you need geospatial analysis and predictive analysis in addition to ETL then EasyMorph won’t meet your needs (this is where Alteryx shines). But if you just need solid ETL capabilities, this is a great tool, at a great price.

 Posted by on September 20, 2017 2) Visualizing Survey Data, Blog Tagged with: , ,  7 Responses »
Aug 232017
 

Why you may be missing important insights if you only look at Percent Top 2 Boxes.

August 23, 2017

Overview

Anyone who has looked to this blog for insights into visualizing survey data knows that my “go to” visualization for Likert scale sentiment data is a divergent stacked bar chart (Figure 1).

Figure 1 -- Divergent stacked bar chart for a collection of 5-point Likert scale questions

Figure 1 — Divergent stacked bar chart for a collection of 5-point Likert scale questions

You might prefer grouping all the positives and negatives together, showing only a three-point scale.  Or perhaps you question having the neutrals “straddle the fence” as it were, with half being positive and half being negative.  These are fair points that I’m happy to debate at another time as right now I want to focus on what happens when we need to compare survey results between two periods.

Showing responses for more than one period

As much as I love the divergent stacked bar chart, it can become a little difficult to parse when you show more than one period for more than one question. Consider the chart below where we compare results for 2017 vs. 2016 (Figure 2).

Figure 2 -- Showing responses for two different periods

Figure 2 — Showing responses for two different periods

As comfortable as I am with seeing how sentiment skews positive or negative with a divergent stacked bar chart, I’m at a loss to compare the results across two different years. The only thing that really stands out is that there appears to be a pretty big difference between 2017 vs. 2016 for “Really important issue 7” at the bottom of the chart.

The allure of Percent Top 2 Boxes

It’s times like these when focusing on the percentage of respondents that selected Strongly agree or Generally agree (Percent Top 2 Boxes) is very tempting.  Consider the connected dot plot in Figure 3.

Figure 3 -- Connected dot plot showing difference between Percent Top 2 Boxes in 2017 and 2016.

Figure 3 — Connected dot plot showing difference between Percent Top 2 Boxes in 2017 and 2016.

Hey, that’s clear and easy to read. Indeed, this is one of my recommended approaches for comparing Importance vs. Satisfaction and it works great for comparing results across two time periods.

So, we’re all done, right?

Not so fast. While this approach will work in many cases, you should never stop exploring as there may be something important that remains hidden when you only show Percent Top 2 Boxes.

It’s not the economy, it’s the neutrals (stupid)

I was recently working with a client who had surveyed a large group about several contentious topics. The client believed that, much like the population of the United States, the surveyed population had become more polarized over the past year, at least with respect to these survey topics.

In reviewing the results for three questions, if we just focus on the positives (Percentage Top 2 Boxes) things look like they have improved (Figure 4.)

Figure 4 -- Connected dot plot showing change in positives (Percentage Top 2 Boxes) between 2017 and 2016.

Figure 4 — Connected dot plot showing change in positives (Percentage Top 2 Boxes) between 2017 and 2016.

See? We have more positives (green) now than we did a year ago (gray.)

This may be true, but it only tells part of the story.

Consider the divergent stacked bar chart shown in Figure 5.

Figure 5 -- 5-Point Divergent stacked bar char comparing results from 2017 and 2016

Figure 5 — 5-Point Divergent stacked bar char comparing results from 2017 and 2016

Woah… there’s something very interesting going on here, but it’s very hard to see.  Maybe if we combine all the positives and negatives the “ah-ha” will be easier to decipher (Figure 6).

Figure 6 -- 3-Point Divergent stacked bar char comparing results from 2017 and 2016. There are big differences between the two time periods, but they are hard to see.

Figure 6 — 3-Point Divergent stacked bar char comparing results from 2017 and 2016. There are big differences between the two time periods, but they are hard to see.

Well, that’s a little better, but the story — and it’s a really big story — is still hidden. Let’s see what happens if we abandon both the connected dot plot and divergent stacked bar chart and instead try a slopegraph (actually, a distributed slopegraph, Figure 7).

Figure 7 -- Distributed slopegraph showing change in positives, neutrals, and negatives.

Figure 7 — Distributed slopegraph showing change in positives, neutrals, and negatives.

Now we can see it!  Just look at the gray lines showing the dramatic change in neutrals.  My client’s hunch was correct — the population has become much more polarized as the percentage of neutrals have plummeted while the percentage of people expressing both positive and negative sentiment has increased. You cannot see this at all with the connected dot plot and it’s hard to glean from the divergent stacked bar chart.

There is no, one best chart for every situation

I had the good fortune to attend one of Cole Nussbaumer Knaflic’s Storytelling with Data workshops. She uses a wonderful metaphor in describing how much work it can take to present just one, really good finding. I paraphrase:

“You have to shuck a lot of oysters to find a single pearl. In your presentations, don’t show all the shells you shucked; just show the pearl.”

For this last example, if I only had 30 seconds of the chief stakeholder’s time I would just show the distributed slopegraph as it is the “pearl.”  It clearly and concisely imparts the biggest finding for the data set: the population has become considerably more polarized for all three issues.

But…

What happens if the chief stakeholder wants to know more? I would be armed with an interactive dashboard to answer questions like these:

“The people that disagree… how many of them strongly disagree?”

“The people that agree… how many of them strongly agree?”

“Are these findings consistent across the entire organization, or only in some areas?”

Conclusion

So, when showing changes in sentiment over time, which chart is best? The connected dot plot? The divergent stacked bar chart? The distributed slopegraph?

To quote my fellow author of the Big Book of Dashboards, Andy Cotgreave, “it depends.”

You should be prepared to apply all three approaches and choose the one that imparts the greatest understanding with the least amount of effort.

Note

I’ve had a number of debates with people about how I prefer to handle neutrals (half on the negative side and half on the positive side). If you find that troubling you can place the neutrals to one side, as shown in Figure 8.

Figure 8 -- Neutrals placed to one side providing a common baseline for comparison.

Figure 8 — Neutrals placed to one side providing a common baseline for comparison.

May 102017
 

May 10, 2017

Overview

Most organizations want to wildly exceed customer expectations for all facets of all their products and services, but if your organization is like most, you’re not going to be able to do this. Therefore, how should you allocate money and resources?

First, make sure you are not putting time and attention into things that aren’t important to your customers and make sure you satisfy customers with the things that are important.

One way to do this is to create a survey that contains two parallel sets of questions that ask customers to indicate the importance of certain features / services with how satisfied they are with those products and services.  A snippet of what this might look like to a survey taker is shown in Figure 1.

Figure 1 -- How the importance vs. satisfaction questions might appear to the person taking the survey.

Figure 1 — How the importance vs. satisfaction questions might appear to the person taking the survey.

How to Visualize the Results

I’ve come up with a half dozen ways to show the results and will share three approaches in this blog post.  All three approaches use the concept of “Top 2 Boxes” where we compare the percentage of people who indicated Important or Very Important (the top two possible choices out of five for importance) and Satisfied or Very Satisfied (again, the top two choices for Satisfaction).

Bar-In-Bar Chart

Figure 2 shows a bar-in-bar chart, sorted by the items that are most important.

Figure 2 -- Bar-in-bar chart

Figure 2 — Bar-in-bar chart

This works fine, as would having a bar and a vertical reference line.

It’s easy to see that we are disappointing our customers in everything except the least important category and that the gap between importance and satisfaction is particular pronounced in Ability to Customer UI (we’re not doing so well in Response Time, 24-7 Support, and East of Use, either.)

Scatterplot with 45-degree line

Figure 3 shows a scatterplot that compares the percent top 2 boxes for Importance plotted against the percent top 2 boxes for Satisfaction where each mark is a different attribute in our study.

Figure 3 -- Scatterplot with 45-degree reference line

Figure 3 — Scatterplot with 45-degree reference line

The goal is to be as close to the 45-degree line as possible in that you want to match satisfaction with importance. That is, you don’t want to underserve customers (have marks below the line) but you probably don’t want to overserve, either, as marks above the line suggest you may be putting to many resources into things that are not that important to your customers.

As with the previous example it’s easy to see the one place where we are exceeding expectations and the three places where we’re quite a bit behind.

Dot Plot with Line

Of the half dozen or so approaches the one I like most is the connected dot plot, shown in Figure 4.

Figure 4 -- Connected dot plot. This is the viz I like the most.

Figure 4 — Connected dot plot. This is the viz I like the most.

(I placed “I like most” in italics because all the visualizations I’ve shown “work” and one of them might resonate more with your audience than this one.  Just because I like it doesn’t mean it will be the best for your organization so get feedback before deploying.)

In the connected dot plot the dots show the top 2 boxes for importance compared to the top 2 boxes for satisfaction.  The line between them underscores the gap.

I like this viz because it is sortable and easy to see where the gaps are most pronounced.

But what about a Divergent Stacked Bar Chart?

Yes, this is my “go to” viz for Likert-scale things and I do in fact incorporate such a view in the drill-down dashboard found at the end of this blog post. I did in fact experiment with the view but found that while it worked for comparing one feature at a time it was difficult to understand when comparing all 10 features (See Figure 5.)

Figure 5 -- Divergent stacked bar overload (too much of a good thing).

Figure 5 — Divergent stacked bar overload (too much of a good thing).

How to Build This — Make Sure the Data is Set Up Correctly

As with everything survey related, it’s critical that the data be set up properly. In this case for each Question ID we have something that maps that ID to a human readable question / feature and groups related questions together, as shown in Figure 6.

Figure 6 -- Mapping the question IDs to human readable form and grouping related questions

Figure 6 — Mapping the question IDs to human readable form and grouping related questions

Having the data set up “just so” allows us to quickly build a useful, albeit hard to parse, comparison of Importance vs. Satisfaction, as shown in Figure 7.

Figure 7 -- Quick and dirty comparison of importance vs. satisfaction.

Figure 7 — Quick and dirty comparison of importance vs. satisfaction.

Here we are just showing the questions that pertain to Importance and Satisfaction (1). Note that measure [Percentage Top 2 Boxes] that is on Columns (2) is defined as follows.

Figure 8 -- Calculated field for determining the percentage of people that selected the top 2 boxes.

Figure 8 — Calculated field for determining the percentage of people that selected the top 2 boxes.

Why >=3?  It turns out that the Likert scale for this data went from 0 to 4, so here we just want to add up everyone who selected a 3 or a 4.

Not Quite Ready to Rock and Roll

This calculated field will work for many of the visualizations we might want to create, but it won’t work for the scatterplot and it will give us some headaches when we attempt to add some discrete measures to the header that surrounds our chart (the % Diff text that appears to the left of the dot plot in Figure 4.) So, instead of having a single calculation I created two separate calculations to compute % top 2 boxes Importance and % top 2 boxes Satisfaction. The calculation for Importance is shown in Figure 9.

Figure 9 -- Calculated field for determining the percentage of folks that selected the top two boxes for Importance.

Figure 9 — Calculated field for determining the percentage of folks that selected the top two boxes for Importance.

Notice that we have all the rows associated with both the Importance questions and Satisfaction “in play”, as it were, but we’re only tabulating results for the Importance questions so we’re dividing by half of the total number of records.

We’ll need to create a similar calculated field for the Satisfaction questions.

Ready to Rock and Roll

Understanding the Dot Plot

Figure 10 shows what drives the Dot Plot (we’ll add the connecting line in a moment.)

Figure 10 -- Dissecting the Dot Plot.

Figure 10 — Dissecting the Dot Plot.

Here we see that we have a Shape chart (1) that will display two different Measure Values (2) and that Measure Names (3) is controlling Shape and Color.

Creating the Connecting Line Chart

Figure 11 shows how the Line chart that connects the shapes are built.

Figure 11 -- Dissecting the Line chart

Figure 11 — Dissecting the Line chart.

Notice that Measure Values is on Rows a second time (1) but the second instance the mark type is a Line (2) and that the end points are connected using the Measure Names on the Path (3).  Also notice that there is no longer anything controlling the Color as we want a line that is only one color.

Combining the Two Charts

The only thing we need to do now is combine the two charts into one by making a dual axis chart, to synchronize the secondary axis, and hide the secondary header (Figure 12.)

Figure 12 -- the Completed connected Dot Plot.

Figure 12 — the Completed connected Dot Plot.

What to Look for in the Dashboard

Any chart that answers a question usually fosters more questions. Consider the really big gap in Ability to Customize UI. Did all respondents indicate this, or only some?

And if one group was considerably more pronounced than others, what were the actual responses across the board (vs. just looking at the percent top 2 boxes)?

Figure 13 -- Getting the details on how one group responded

Figure 13 — Getting the details on how one group responded

The dashboard embedded below shows how you can answer these questions.

Got another approach that you think works better?  Let me know.

Mar 082017
 

And… what did they choose?

March 8, 2017

Overview

I’ve discussed how to visualize check-all-that-apply questions in Tableau. Assuming your survey is coded as Yes = 1 and No = 0, you can fashion a sorted bar chart like this the one shown in Figure 1 using the following calculation.

SUM([Value]) / SUM(Number of Records)

The field [Value] would be 0 or 1 for each respondent that answered the question.

Figure 1 -- Visualizing a check-all-that-apply question

Figure 1 — Visualizing a check-all-that-apply question

I’ve also discussed how we can see break this down by various demographics (Gender, Location, Generation, etc.)

What I’ve not blogged about (until now) is how to answer the following questions:

  • How many people selected one item?
  • Two items?
  • Five items?
  • Of the people that only selected one item, what did they select?
  • Of the people that selected four items, what did they select?

Prior to the advent of LoD calculations this was doable, but a pretty big pain in the ass.

Fortunately, using examples that are “out in the wild” we can cobble together a compelling way to show the answers to these questions.

 

Visualizing How Many People Selected 1, 2, 3, N Items?

One of the best blog posts on Level-of-Detail expressions is Bethany Lyons’ Top 15 LoD Expressions.

It turns out the very first example discusses how figure out how many customers placed one order, how many placed two orders, etc.  This will give us exactly what we need to figure out how many people selected 1, 2, 3, N items in a check-all-that-apply question.

Here’s the calculation that will do the job.

Figure 2 -- The LoD calculation we'll need.

Figure 2 — The LoD calculation we’ll need.

This translates as “for the questions you are focusing on (and you better have your context filters happening so you are only looking at just the check-all-that-apply stuff), for each Resp ID, add up the values for all the questions people answered.”

Remember, the responses are 0s and 1s, so if somebody selected six things the SUM([Value]) would equal 6.

So, how do we use this?

The beautiful thing about using FIXED as our LoD keyword is that it allows us to turn the results into a dimension.  This means we can put How Many Selected on columns and CNTD(Resp ID) on rows and get a really useful histogram like the one shown in Figure 3.

Figure 3 -- Histogram showing number of respondents that selected 0 items, 1 item, 2 item, etc.

Figure 3 — Histogram showing number of respondents that selected 0 items, 1 item, 2 item, etc.

Notice the filter settings indicating that we only want responses to the check-all-that-apply questions. Further note that this filter has been added to the context which means we want Tableau to filter the results before computing the FIXED LoD calculation.

 

So, what did these People Select?

Okay, now we know how many people selected one item, two items, etc.

Just what did they select?

Because we set [How Many Selected] using the FIXED keyword we can use it like any other dimension.  That is, it will behave just like [Gender], [Location], and so on.

Borrowing from an existing technique (the visual ranking by category that I cited earlier) we can fashion a very useful dashboard that allows us to see some interesting nuances in the data.  For example, while Metabolism is ranked second overall with 70% of people selecting it, it ranked seventh among those that only selected one item (with only 4%), while 84% of people that selected four items selected it (Figure 4.)

Figure 4 -- Metabolism is ranked second overall with 70%, but only 4% of folks that chose one item selected it.

Figure 4 — Metabolism is ranked second overall with 70%, but only 4% of folks that chose one item selected it.

Similarly, check out the breakdown for Blood Pressure which is ranked third with 60% overall but is ranked first among folks that only measure one thing (Figure 5.)

Figure 5 -- Metabolism is ranked third overall with 60%, but was ranked first among those that only selected on item.

Figure 5 — Metabolism is ranked third overall with 60%, but was ranked first among those that only selected on item.

 

Other Useful Features of the Dashboard

The Marginal Histogram

The marginal histogram along the bottom of the chart shows you the breakdown of responses.

Figure 6 -- Marginal histogram shows distribution of responses

Figure 6 — Marginal histogram shows distribution of responses

Tool Tips Help Interpret the Findings

The ordinal numbers can be confusing as sometimes the number 2 means the number of items selected and other times it is the ranking.  Hovering over a bar explains how to interpret the results.

Figure 7 -- Tool tips help you interpret the results.

Figure 7 — Tool tips help you interpret the results.

Swap Among Different Dimensions

While this is first and foremost a blog post about showing how many people selected a certain number of items (and what they selected) it was very easy to add a parameter that allows you to swap among different dimensions.  In Figure 8 we see the break down by Location.

Figure 8 -- Use the Break Down by parameter to see rank and magnitude for the selected item among different dimensions, in this case Location.

Figure 8 — Use the Break Down by parameter to see rank and magnitude for the selected item among different dimensions, in this case Location.

Here’s the embedded workbook for you to try out and download.

Jan 152017
 

By Steve Wexler, January 15, 2017

Overview

Before going any further this post assumes you’ve gotten your data “just so”; that is, you’ve reshaped your data and have the responses in both text and numeric form.

If you’re not sure what this means, please review this post.

Taking inventory by finding the universe of all questions and all responses

Anyone who has attended one of my survey data classes or has watched my 2014 Tableau Conference presentation knows the first thing I like to do is assemble a demographics dashboard so I know who took the survey.

The second thing I do arose a few months ago when I had a “why didn’t I do this before” moment with respect to getting a good handle on questions, responses, and seeing if there was anything that was poorly coded.

Here’s how it works.

Note that I’m using the same sample data set that I use for my classes.

Figure 1 -- Screen shot after just having connected to the data.

Figure 1 — Screen shot after just having connected to the data.

  1. Drag Question Grouping onto rows, followed by Wording, and then Question ID, as shown below.

    02_PartialInventory

    Figure 2 — About half-way through taking inventory.

  2. Right-click the measure called Value and select Duplicate.
  3. Rename the newly-created field Value (discrete).
  4. Drag the measure into the Dimensions area. This will make Tableau treat the field as something that is by default discrete.

    Figure 3 -- Turning the measure into a discrete dimension

    Figure 3 — Turning the measure into a discrete dimension

  5. Drag new newly-created dimension to Rows.
  6. Drag Labels to Rows. Your screen should look like the one shown below.
Figure 4 -- All the questions and all the responses, all on one sheet.

Figure 4 — All the questions and all the responses, all on one sheet.

So, just what do we have here?

You can see from the portion of the screen that you have a bunch of questions about “Importance” and can also see that the possible values go from 0 to 4 where 0 maps to “Not at all important”, 1 maps to “Of Little Importance”, etc.

At this point you should be looking for any stray values, say a value of 5.

If you scroll down a little bit (Figure 5) you’ll see a question grouping called “Indicate the degree to which you agree” where you again have values of 0 through 4 but this time 0 maps to “Not at all”, 1 maps to “Small degree”, etc.

We should be pleased as it appears that our Likert questions consistently go from 0 through 4. This means we won’t have to craft multiple sets of calculated fields to deal with different numeric scales (not that having to do that would be a big deal).

05_Further Inventory

Figure 5 — Next set of questions.

At this point it might be useful to add a filter so you can focus on only certain question groups. You can do this by filtering by Question Grouping as shown below.

Figure 6 -- Adding a filter makes it easier to focus on specific question groupings

Figure 6 — Adding a filter makes it easier to focus on specific question groupings

Spotting questions that have coding errors

In case you’re wondering what a coding error looks like, see what happens if we just focus on the “What do you measure” questions, as shown below.

Figure 7 -- An example of a mis-coded check-all-that-apply question

Figure 7 — An example of a mis-coded check-all-that-apply question

So, for all of the check-all-that-apply questions the universe of possible values is 0 and 1. And with the exception of Question Q6 (Breathing), 0 maps to “No” and 1 maps to “Yes.”

The mis-coding of “Ni” instead of “No” will only present a problem if our calculated field for determining the percentage of people that checked an item were to use Labels instead of Values. My preferred formula for this type of calculation is this:

SUM([Value]) / SUM([Number of Records)])

Because we’re using [Value] instead of [Label], the miscoding for this example won’t cause a problem.

Conclusion

Creating a giant text table that maps all Question Groupings, Question IDs, Labels, and Values on a single sheet allows us to quickly take an inventory of all questions and possible responses.  This in turn allows us to see if questions are coded consistently (e.g., do all the Likert Scale questions use the same scale) and to see if there are any coding errors.

I just wish I had started doing this years ago.

 

 Posted by on January 15, 2017 2) Visualizing Survey Data, Blog Tagged with: ,  No Responses »
Nov 282016
 

Note: Major thanks to Nazirah Garrison and Christie Clark at Tableau for suggesting this approach.

Overview

With Tableau 10.x it is in fact possible to get your survey data, “just so” without having to invest in new tools and / or a engage in a time-consuming, error-prone procedure every time you need receive updated survey data.

There’s a lot of upside to this approach — everything is built into Tableau and you’ll just need to refresh the extracts (there will be several of them) as you get new data.  The downside is the setup is a little bit cumbersome and Tableau manifests some “hmm, that doesn’t seem right” behavior along the way.

Before embarking on this I encourage you read this post so you understand what I mean when I refer to the data being “just so.”

Also, if you would like to follow along you can download the source data here.

Three files and three extracts

Anyone that has read my posts or attended my classes know that I want survey responses in both text and numeric formats. Sure, you can in fact manage with one format or the other, but you’re just creating a LOT more work for yourself if you don’t have the data in both formats.

You will also need what I call a “Helper” file — this is just a separate file that maps each question ID into human readable form and groups related questions together. Again, you can certainly get by without it but you’ll be working much harder than you need to, especially if you ever compare “Importance” with “Satisfaction” Likert-scale questions.

For this example we will create a separate extract for each of these three files, then use Tableau 10.x’s ability to join three different data sources.

Let’s start by creating the extracts.

To pivot the data labels and create an extract

  1. Start Tableau 10.x and indicate you want to connect to Excel.
  2. Connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx and drag the sheet Data Labels into the Drag sheets here area as shown below.
    01_draglabels
  3. Leave the first five columns intact (the Resp ID, demographic stuff, and Weight) and select all the other columns.
  4. Click any of the selected columns and select Pivot from the context menu.
    02_firstpivot
  5. Rename the first pivoted column Question ID and the second column Label as shown below.
    03_rename
  6. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  7. When asked to save, name the file DR_JustSo_Labels.tde (make sure to note where you are saving the file).

One down, two to go.

To pivot the data numbers and create an extract

  1. Click the New Data Source icon and indicate you want to connect to an Excel file.
  2. As before, connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx, but this time drag Data Numbers to the Drag sheet here area.
  3. Hide the columns labeled Gender, Location, Generation, and Weight — we already have them in the other data source and don’t need them twice.
  4. Leaving Resp ID in place, select the second through the last columns.
  5. Click the down arrow on any of the selected columns and select Pivot from the context menu.
  6. Rename the first pivoted column Question ID and the second column Value, as shown below.
    04_secondpivot
  7. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  8. When asked to save, name the file DR_JustSo_Numbers.tde.

Two down, one to go.

To create the question helper extract

  1. Click the New Data Source icon and indicate you want to connect to an Excel file.
  2. As before, connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx, but this time drag Question Helper to the Drag sheet here area.
  3. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  4. When asked to save, name the file DR_QuestionHelper.tde.

All three data sources are now ready.

Joining the three data sources together

We now have our three data sources as separate Tableau extract files. We’ll combine these three files (and create an extract from the joined files) using Tableau 10.x’s ability to join files from different data sources.

Note: This is where we’ll encounter Tableau’s “head-scratching” behavior.

To join the three data sources

Note: In first trying this Tableau presented a lot of warning messages about not being able to materialize a temporary table. While I could ignore these warnings and muddle through, you may not be so lucky. It turns out the culprit was my anti-virus software. I temporarily disabled it and everything worked without a hitch.  See http://kb.tableau.com/articles/issue/error-unable-to-materialize-temporary-table-joining-data-sources.

  1. Click the New Data Source icon and indicate you want to connect to More, as shown below.
    05_more
  2. Select DR_JustSo_Labels.tde and click Open.
  3. Click Add, as shown below.
    06_add
  4. Click More and then select DR_JustSo_Numbers.tde. Do not be fooled, the correct fields have NOT yet been joined.
  5. Click the overlapping Venn diagram to display the Join dialog box, as shown below.
    07_join1
  6. Click Number of Records and then click the X that appears in the row to indicate you do NOT want to join these two data sources using this field.
  7. From the left data source indicate you want to join using Resp ID, as shown below.
    08_join2
  8. From the right data source indicate that you want to join using RespID (Extract1). I have no idea why the field is named this way. More on this in a moment.
  9. From the left data source indicate that you want to join using Question ID. Yes, you need to join on more than one field.
  10. From the right data source indicate you want to join using Pivot Field Names (Extract1). Okay… THIS is the thing that has me scratching my head and as of this writing (November 27, 2016) I have no idea why the field isn’t also called Question ID.

    January 4, 2017 — Now I know why this is happening. When using data extracts (.TDE files) Tableau is only able to keep track of the alias names for fields in the first .TDE file.  All of that info gets stripped out from the second .TDE file and Tableau just sees the original field name (Pivot Field Names). A little off-putting, yes, easy to address and it won’t matter a lick once we’re building our visualizations.

  11. Click Add again, select More, and select DR_QuestionHelper.tde.
  12. Click the second Venn overlapping circle and remove any of the joins that may be in there (most likely again using Number of Records).
  13. From the left data source select Question ID and from the right data source select QuestionID as shown below.
    09_join3
  14. Hide the fields RespID (Extract1), both Number of Records fields, the second Question ID, and Pivot Field Names, and rename Pivot Field Values to Value. Your screen should look like this.
    10_joinall
  15. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  16. When asked to save, name the file DR_JustSoAll.tde. After creating the extract Tableau will show all the field names grouped by data source, as shown below.
    11_datasource
  17. Click the down arrow and select Group by Folder.
    Okay, you don’t have to do this but I see no reason to group the fields by data source.

Congratulations, you now have your data “just so” and you did it all in Tableau.

So, how do you get the extracts to refresh?

Good question.  If you are using either Tableau Server or Online you can create shared data sources and program the extract to refresh on a regular or as-needed basis.

If you are using desktop — and you have all four data sources in in one workbook — you can just click the  Data menu and select Refresh All Extracts.

Conclusion

While I find the process outlined here both cumbersome and confusing (what is up with those renamed fields NOT staying renamed?) this approach does appear to work and you only need to set it up once. The same cannot be said of The Tableau add-in for Excel which requires a lot of manual intervention every time you want to update the data.

Will this replace Alteryx as my tool of choice? No, but it does work and you can’t beat the price.

 Posted by on November 28, 2016 2) Visualizing Survey Data, Blog Tagged with: , , , ,  52 Responses »