Jan 112016
 

Overview

I spend a lot of time with survey data and much of this data revolves around gauging people’s sentiments and tendencies using either a Likert Scale or a Net Promoter Score (NPS) type of thing.

Examples

Here’s an example of gauging sentiment using a 5-point Likert scale.

Indicate how satisfied you are with the following:

00_Grid1

Here’s an example of measuring tendencies, using a 4-point Likert scale.

How often do you use the following learning modalities?

00_Grid2

So, what’s a good way to visualize responses to these types of questions?

Over the past ten years I’ve spent thousands of hours working on the best ways to show how opinion and tendencies skew one way or another.  I have found that in most cases a divergent stacked bar chart helps me (and more importantly, my clients) best see what’s going on with the survey responses.

In this blog posts we’ll

  • See an example of a divergent stacked bar chart (also called a staggered stacked bar chart)
  • Work through a data visualization improvement process
  • Show how to visualize different scales (e.g., NPS, Top 3/Bottom 3, 5-point Likert, etc.)
  • Show sentiment and tendencies over time
  • Present a dashboard that will allow you to experiment with different visualization approaches

Note: for step-by-step instructions on how to build a Likert-scale divergent stacked bar chart in Tableau, click here.

Divergent Stacked Bar vs. 100% Stacked Bar

Readers of my newsletter and folks visiting the web site may have seen my redesign of a New York Times infographic that showed the tendencies of politicians to lie or tell the truth.  Here’s the 100% Stacked Bar chart that appeared in the New York Times.

Figure 1 -- 100% stacked bar chart.

Figure 1 — 100% stacked bar chart.

Here’s the redesign using a divergent stacked bar chart.

Figure 2 -- Divergent stacked bar chart.

Figure 2 — Divergent stacked bar chart.

With both the 100% stacked bar chart and the divergent stacked bar charts the overall length of the bars is the same, but with the divergent approach the bars are shifted left or right to show which way a candidate leans. I, and others I’ve polled, find that shifting the bars makes the chart easier to understand.

How We Got Here — Likert Scale Improvement Process

Consider the table below that shows the results from a fictitious poll on the use of various learning modalities.

Figure 3 -- Table with survey results.

Figure 3 — Survey results in a table.

I can’t glean anything meaningful from this.

What about a bar chart?

Figure 4 -- Likert scale questions using a bar chart. Yikes.

Figure 4 — Likert scale questions using a bar chart. Yikes.

Wow, that’s really bad.

What about a 100% stacked bar chart?

Figure 5 -- 100% stacked bar chart using default colors.

Figure 5 — 100% stacked bar chart using default colors.

Okay, that’s better, but it’s still pretty bad as Tableau’s default colors do nothing to help us see tendencies that are adjacent. That is, “Often” and “Sometimes” should have similar colors, as should “Rarely” and “Never.”

So, let’s try using better colors…

(…and don’t even think about using red and green.)

Figure 6 -- 100% stacked bar chart using a more appropriate color scheme.

Figure 6 — 100% stacked bar chart using a more appropriate color scheme.

This is certainly an improvement, but the modalities are listed alphabetically and not by how often they’re used. Let’s see what happens when we sort the bars.

Figure 7 -- Sorted 100% stacked bar chart with good colors.

Figure 7 — Sorted 100% stacked bar chart with good colors.

It’s taken us several tries, but it’s now easier to see which modalities are more popular.

But we can do better.

Here’s the same data rendered as a divergent stacked bar chart.

Figure 8 -- Sorted divergent stacked bar chart with good colors.

Figure 8 — Sorted divergent stacked bar chart with good colors.

Of course, we can also look take a coarser view and just compare Sometimes/Often with Rarely/Never, as shown here.

Figure 9 – Divergent stacked bar chart with only two levels of sentiment.

Figure 9 – Divergent stacked bar chart with only two levels of sentiment.

I find that the divergent approach “speaks” to me and it resonates with my colleagues and clients.

Experiments using Different Scales

A while back Helen Lindsey was kind enough to send me some data that contained responses to some Net Promoter Score questions.  Specifically, folks were asked to rate companies/products on a 0 to 10 or 1 to 10 scale.

Figure 10 -- The classic Net Promoter Score (NPS) question

Figure 10 — The classic Net Promoter Score (NPS) question

We compute NPS by subtracting the percentage of folks that are promoters (i.e., people who responded with a 9 or a 10), subtracting the percentage of folks that are detractors (i.e., people who responded with a 0 through 6) and multiplying by 100.

But sometimes my clients have questions that are on a 10 or 11-point scale but instead want to compute the percentage of folks that responded with one of the top three boxes minus the percentage of folks that responded with the bottom three boxes.

I realized that the Lindsey data set could provide a type of “sandbox” where we could experiment with different sentiment scales including NPS, Top 3 minus Bottom 3, 5-point Likert, 3-point Likert, and 2-point Likert.

Let’s look at the results of some of these experiments.

NPS

Here are two ways we can visualize NPS data.  The first shows the percentages of people that fall into the three categories.

Figure 11 -- NPS showing percentages

Figure 11 — NPS showing percentages

Here’s the same view, but with the NPS score superimposed over the divergent stacked bars.

Figure 12 -- NPS with score superimposed

Figure 12 — NPS with score superimposed

NPS over Time

It turns out that divergent stacked bars are great at showing NPS trends over time.  Here’s a view using percentages.

Figure 13 -- Divergent stacked bar showing NPS over time with percentages

Figure 13 — Divergent stacked bar showing NPS over time with percentages

Here’s the same view but with the score superimposed.

Figure 14 -- Divergent stacked bar showing NPS over time with scores

Figure 14 — Divergent stacked bar showing NPS over time with scores

Note – for some other interesting treatments of showing sentiment over time, see Joe Mako’s visualization on banker honesty.

Net = Top 3 minus Bottom 3

Let’s take the same data but divide it into the following buckets:

  • Positive = Top 3 Boxes
  • Neutral = Middle 4 Boxes
  • Negative = Bottom 3 Boxes

Here are the associated visualizations.

Figure 15 -- Top 3 / Bottom 3 showing with percentages

Figure 15 — Top 3/Bottom 3 showing with percentages

Figure 16 -- Top 3 / Bottom 3 with scores

Figure 16 — Top 3/Bottom 3 with scores

Five, Three, and Two-Point Likert Scale Renderings

Let’s suppose that instead of asking a questions on a 1 through 10 scale we instead asked folks to select one of the following five responses:

  • Strongly disagree
  • Disagree
  • Neutral
  • Agree
  • Strongly agree

Here’s the same NPS data but rendered using a five-point Likert scale.

Figure 17 -- Divergent stacked bar chart showing all responses

Figure 17 — Divergent stacked bar chart showing all responses

And here’s the same data, but divided into positive, neutral, and negative sentiments (3-point Likert).

Figure 18 -- Divergent stacked bar showing positive, neutral, and negative

Figure 18 — Divergent stacked bar showing positive, neutral, and negative

Finally, here’s the same data, but only showing positive and negative sentiments (2-point Likert).

Figure 19 -- Divergent stacked bar showing just positive and negative

Figure 19 — Divergent stacked bar showing just positive and negative

Try it yourself

Below you will find a dashboard that allows you to explore different combinations of the 1 to 10 scale.

I strongly recommend you do NOT give your audience all these scaling options;  these are here for you to experiment and see how the visualizations and ranking change based on what scales you use.  The only option I would present to your audience is the ability to toggle back and forth between percentages and scores.

Nov 222015
 

Overview

I received an e-mail inquiry about weighted data recently and realized that while I cover this in my survey data class I had not yet posted anything about it here.  Time to remedy that.

The good news is that it is not at all difficult to work with weighted survey data in Tableau.  And just what do I mean by weighted data? We use weighting to adjust the results of a study so that the results better reflect what is known about the population. For example, if the subscribers to your magazine are 60% female but the people that take your survey are only 45% female you should weigh the responses from females more heavily than males.

To do this each survey respondent should have a weighting amount associated with their respondent ID, as shown here.

Figure 1 – A snippet of survey data showing a separate column for Weight.

Figure 1 – A snippet of survey data showing a separate column for Weight.

When pivoting / reshaping the data make sure that [Weight] does not get reshaped.  It should remain in its own column like the other demographic data.

Once this is in place we’ll need to modify the formulas for the following questions types:

  • Yes / No / Maybe (single punch)
  • Check-all-that-apply (multi-punch)
  • Sentiment / Likert Scale (simple stacked bar)
  • Sentiment / Likert Scale (divergent stacked bar)

Yes / No / Maybe (single punch)

With this type of question you usually want to determine the percentage of the total.

02_YesNoMaybe

Figure 2 — Visualization of a single-punch question

Unweighted calculation

The table calculation to determine the percentage of total using unweighted data is

   SUM([Number of Records]) / TOTAL(SUM([Number of Records]))

Weighted calculation

The table calculation to determine the percentage of total using weighted data is

   SUM([Weight]) / TOTAL(SUM([Weight]))

Check-all-that-apply (multi punch)

With this type of question you usually want to determine the percentage of people that selected an item.  The total will almost always add up to more than 100% as you are allowing people to select multiple items.

Figure 3 -- Visualization of a multi-punch question

Figure 3 — Visualization of a multi-punch question

Most surveys will code the items that are checked with a “1” and those that are not checked with a “0”.

Unweighted calculation

The calculation to determine the percentage of people selecting an item using unweighted data is

   SUM([Value]) / SUM([Number of Records])

where [Value] is the name of the measure that contains the survey responses.  If the survey responses are coded as labels instead of numbers you can use this formula instead.

   SUM(IF [Label]="Yes" then 1 ELSE 0 END) / SUM([Number of Records])

Weighted calculation

The calculation to determine the percentage of people selecting an item using weighted data is

   SUM(IF [Value]=1 then [Weight] ELSE 0 END) / SUM([Weight])

Sentiment / Likert Scale (simple stacked bar)

This is very similar to the single-punch question but instead we have several questions and compare them using a stacked bar chart.  I am not a big fan of this approach but it can be useful when you superimpose some type of score (e.g., average Likert value, percent top 2 boxes, etc.).

Figure 4 -- Simple Likert Scale visualization

Figure 4 — Simple Likert Scale visualization

Figure 5 -- Simple Likert Scale visualization with Percent Top 2 Boxes combo chart

Figure 5 — Simple Likert Scale visualization with Percent Top 2 Boxes combo chart

Unweighted calculation – Stacked Bar

The table calculation to determine the percentage of total using unweighted data is

   SUM([Number of Records]) / TOTAL(SUM([Number of Records]))

Weighted calculation – Stacked Bar

The table calculation to determine the percentage of total using weighted data is

   SUM([Weight]) / TOTAL(SUM([Weight]))

Unweighted calculation – Percent Top 2 Boxes

Assuming a 1 through 5 Likert scale, the calculation to determine the percentage of people selecting either Very high degree or High Degree (top 2 boxes) using unweighted data is

   SUM(IF [Value]>=4 then 1 ELSE 0) / SUM([Number of Records])

Weighted calculation – Percent Top 2 Boxes

Assuming a 1 through 5 Likert scale, The calculation to determine the percentage of people selecting either Very high degree or High Degree (top 2 boxes) using weighted data is

   SUM(IF [Value]>=4 then [Weight] ELSE 0) / SUM([Weight])

Sentiment / Likert Scale (divergent stacked bar)

Here is what I believe is a preferable way to show how sentiment skews across different questions.

Figure 6 -- A divergent stacked bar chart

Figure 6 — A divergent stacked bar chart

I’ve covered how to build this type of chart using unweighted values here.

There are six fields we need to fashion the visualization, three of which need to be modified to make the visualization work with weighted data.

  • Count Negative
  • Gantt Percent
  • Gantt Start
  • Percentage
  • Total Count
  • Total Count Negative

Count Negative – Unweighted

Assuming a 1 – 5 Likert scale, the calculation to determine the number of negative sentiment responses using unweighted data is

   IF [Score]<3 THEN 1
   ELSEIF [Score]=3 THEN .5
   ELSE 0 END

Count Negative – Weighted

Assuming a 1 – 5 Likert scale, the calculation to determine the number of negative sentiment responses using weighted data is

   IF [Score]<3 THEN [Weight]
   ELSEIF [Score]=3 THEN .5 * [Weight]
   ELSE 0 END

Percentage – Unweighted

The calculation that determines both the size of the Gantt bar and the label for the bar using unweighted data is

   SUM([Number of Records])/[Total Count]

Percentage – Weighted

The calculation that determines both the size of the Gantt bar and the label for the bar using weighted data is

   SUM([Weight])/[Total Count]

Total Count – Unweighted

The calculation that determines the total number of responses for a particular question for unweighted data is

   TOTAL(SUM([Number of Records]))

Total Count – Weighted

The calculation that determines the total number of responses for a particular question for weighted data is

   TOTAL(SUM([Weight]))

Summary

Here’s a summary of all the unweighted calculations and their weighted equivalents

Unweighted Weighted
SUM([Number of Records]) / TOTAL(SUM([Number of Records])) SUM([Weight]) / TOTAL(SUM([Weight]))
SUM([Value]) / SUM([Number of Records]) SUM(IF [Value]=1 then [Weight] ELSE 0 END) / SUM([Weight])
SUM(IF [Value]>=4 then 1 ELSE 0) / SUM([Number of Records]) SUM(IF [Value]>=4 then [Weight] ELSE 0) / SUM([Weight])
IF [Score]<3 THEN 1 ELSEIF [Score]=3 THEN .5 ELSE 0 END IF [Score]<3 THEN [Weight] ELSEIF [Score]=3 THEN .5 * [Weight] ELSE 0 END
SUM([Number of Records])/[Total Count] SUM([Weight])/[Total Count]
TOTAL(SUM([Number of Records])) TOTAL(SUM([Weight]))

 

Oct 132015
 

Overview

You can download a trial version of Alteryx here.

You can find the Excel source data here.

You can download the completed Alteryx module from here.

Understanding the workflow

Here’s the Alteryx workflow that will get the data in the format we want.

Figure 9 -- Alteryx workflow for getting the survey data in an optimal format for analysis in Tableau

Figure 9 — Alteryx workflow for getting the survey data in an optimal format for analysis in Tableau

If this is the first time you’ve seen an Alteryx workflow it may look a little complicated (shades of Rube Goldberg) but think about how complicated the flow and decision tree is for what you do when you get up in the morning and get ready for work – it’s way more complicated.

Let’s start by focusing on the three data sources that we need (labeled 1, 2, and 3 in the diagram above).

  1. Data with numeric responses (we’ll use this for our demographic data, too)
  2. Data with text responses
  3. The Question Helper file (contains our meta data).

A note about SPSS files

For this example we’re using Excel data but the process would be identical for survey data saved in an SPSS format (.SAV file).  SPSS encodes both the text and numeric data in a single file, but you need to input values from that same data source twice – once for text and once for numeric values.

When you specify an SPSS file as a data source you’ll see the following check box option.

Figure 10 -- Options that appear when you bring in data from an SPSS file.

Figure 10 — Options that appear when you bring in data from an SPSS file.

Selecting “Expand Factors” will bring in the data as labels.  Unchecking this box will bring in the data as numbers.

Demographics

Figure 11 -- Selecting the demographic components from the data source

Figure 11 — Selecting the demographic components from the data source

Clicking the Select tool reveals the following settings.

Figure 12 -- Specifying which fields we want to keep

Figure 12 — Specifying which fields we want to keep

A translation into English would be “open this Excel file and select these fields”.  Here we keep RespID, Gender, Location, Generation, and Weight.

Numeric values

Figure 13 -- Numeric values section of the Alteryx workflow with the Reshape tool selected

Figure 13 — Numeric values section of the Alteryx workflow with the Transpose tool selected

Using the same data source, selecting the Transpose tool reveals the following settings.

Figure 14 – Transpose tool settings

Figure 14 – Transpose tool settings

Here we indicate that we want to keep RespID and pivot all the other checked fields.  Notice that the demographic components are not checked.

Looking back to the workflow diagram, we next follow this pivot with a Select tool which we use to rename the pivoted fields as shown here.

Figure 15 -- The Select tool allows us to name the pivoted fields

Figure 15 — The Select tool allows us to name the pivoted fields

Here we are naming the first pivoted field Question ID and the second pivoted field Numeric value.

Dealing with nulls

Figure 16 -- The Filter tool allows us to eliminate null values before they hit Tableau.

Figure 16 — The Filter tool allows us to eliminate null values before they hit Tableau.

While not critical, Alteryx allows us to eliminate null values before they come into Tableau.  Here’s the setting for this tool.

Figure 17 -- Setting to eliminate nulls from the Numeric value field

Figure 17 — Setting to eliminate nulls from the Numeric value field

Joining the demographic with the transposed (pivoted) numeric data

Figure 18 -- Joining the demographic and transposed numeric data

Figure 18 — Joining the demographic and transposed numeric data

We’re now ready to join the demographic data with the pivoted numeric survey data. We do this with Alteryx’s Join tool.  Here are the settings.

Figure 19 -- Joining demographics and numeric data using the common field, RespID. Notice that we don't include the redundant version of RespID that is coming from the right table.

Figure 19 — Joining demographics and numeric data using the common field, RespID. Notice that we don’t include the redundant version of RespID that is coming from the right table.

Checking our progress

If we add a Browse tool after the Join we’ll see the following results.

Figure 20 -- Results of the initial Join

Figure 20 — Results of the initial Join

Note that we could, in fact, get by with using just this in Tableau, but the whole point is to make things faster and easier, so we’re going to add text responses and the “Helper” data.

Text Values

Figure 21 -- Workflow for setting the text values

Figure 21 — Workflow for setting the text values

This is identical to the numeric values except that we specify a different data source (one with label responses instead of numeric) and we rename the second pivoted field differently, as shown here.

Figure 22 -- The Select tool allows us to name the pivoted fields

Figure 22 — The Select tool allows us to name the pivoted fields

Here we are naming the first pivoted field Question ID and the second Text value.

Tying Demographic, Numeric and Text Values Together

We need to combine the demographic / numeric values with the text values.  We will again use Alteryx’s Join tool, but we’ll need to join on both RespID and Question ID.

Figure 23 -- Joining demographic, numeric, and text data together

Figure 23 — Joining demographic, numeric, and text data together

The setting for the Join is shown below.

Figure 24 -- Join settings for combining demographics / numeric with text data

Figure 24 — Join settings for combining demographics / numeric with text data

Adding the question meta data and outputting the results

We’re almost done; we just need to add the information that tells us about each Question ID so we can map the ID to its human-readable form, group related IDs together, and so on.

Figure 25 -- Joining the "Helper" file data

Figure 25 — Joining the “Helper” file data

The setting for this Join is shown below.

Figure 26 -- Marrying the question meta data to the demographic and survey data

Figure 26 — Marrying the question meta data to the demographic and survey data

The only thing left to do is output this to a new data source.  In the sample workflow I output to an Excel file but you can just as easily write directly to a Tableau Data Extract file.

But Wait!  There’s More!

I intentionally used data that was well behaved, but what happens if the data is “messy”?  For example, suppose there’s an extra row below the field names?  Or suppose that for a check-all-that-apply question, instead of 1s and 0s we instead have 1s and blanks (which would mess things up in Tableau)?  Alteryx handles both of these scenarios (and dozens more) easily.

Is there a Downside?

If there’s any downside it would be the cost as Alteryx starts at around $4,000 per year.  If all you are doing is some occasional survey data reshaping then this fee would probably be prohibitively expensive. If, however, you need to corral a lot of data – or if you need advanced data blending, geospatial analysis, or predictive modeling – then Alteryx is a huge bargain.

 Posted by on October 13, 2015 2) Visualizing Survey Data, Blog Tagged with: , , , , ,  8 Responses »
Oct 132015
 

Overview

In writing about visualizing survey data using Tableau I’ve found that the number one impediment to success is getting the data in the right format. In accompanying posts I’ll explain how to get this done using Alteryx, Tableau 10.x, the Tableau Excel add-in, and Tableau 9.0 pivot feature (you can come close with 9.x, but can’t get it perfect).

What do I mean by “just so”?

When I deal with survey data there are usually four different elements that need to fit together:

  1. The demographic information (e.g., age of respondents, gender, etc.)
  2. Survey responses in text format
  3. Survey responses in numeric format
  4. Meta data that describes the survey data.

Let’s see what the four elements look like using an Excel sample data set (click here to download).

Demographic data

Here’s what the demographic data looks like.

Figure 1 -- Demographic data

Figure 1 — Demographic data

Survey responses in text format

Here are several columns of survey responses in text format.  Column F contains data for a Yes / No / Don’t know question.  Column G contains responses for a question about salary.  Columns H through P are responses for check-all-that apply questions and columns Q and R contain Likert scale responses.

Figure 2 -- Survey responses in text format

Figure 2 — Survey responses in text format

Survey responses in numeric format

Here are the same responses but in numeric format.

Figure 3 -- Survey responses in numeric format

Figure 3 — Survey responses in numeric format

I’ll explain why it’s so useful to have the survey responses in both text and numeric format in a bit.

Meta Data (the “helper” file)

Here’s some data that I usually prepare by hand as most survey tools won’t produce it for me automatically.  Having this helps me understand the data and will  greatly streamline my work in Tableau.

Figure 4 -- Survey data meta data. This doesn’t take long to create and will be a huge time saver once we get the data into Tableau.

Figure 4 — Survey data meta data. This doesn’t take long to create and will be a huge time saver once we get the data into Tableau.

What does “just so” look like?

Our goal is to combine and reshape the various elements so that they look like this.

Figure 5 -- Reshaped data joined with meta data. Survey data in this format is very easy to use with Tableau.

Figure 5 — Reshaped data joined with meta data. Survey data in this format is very easy to use with Tableau.

As I’ve written previously, the key thing is that I no longer have a separate column for each survey response.  Indeed, I’ve reduced the number of columns from 45 to just 11, but I’ve also increased the number of rows from 845 to over 25,000. That is a good thing.

Why this works so well with Tableau

Our goal is to see how to get Alteryx to get the data in this format, not to actually use the data, but if you need convincing on why the meta data is so helpful, consider the following example.

Let’s say that in your survey you ask people to indicate the importance and satisfaction about certain services, as shown here.

Figure 6 -- Question comparing importance with satisfaction

Figure 6 — Question comparing importance with satisfaction

With the data set up “just so” conducting this comparison in Tableau becomes easy.  First we can drag Question Grouping into Filters and indicate that we just want to look at Importance and Satisfaction questions.

Figure 7 -- Using the Question Grouping field to just focus on Importance and Satisfaction questions

Figure 7 — Using the Question Grouping field to just focus on Importance and Satisfaction questions

Then we can drag Wording and Question Grouping onto the Rows shelf which gives us the framework for comparing importance and satisfaction across ten different questions.  No more having to “look up” which questions we want to explore and no more having to alias question IDs.  I love this!

Figure 8 – The helper file meta data provides the framework for comparing questions and building visualizations.

Figure 8 – The helper file meta data provides the framework for comparing questions and building visualizations.

Why do we need both text and numeric results?

We don’t really need them, but I know I certainly want them.

Consider all of the Likert scale question results.  The universe of possible values are

1
2
3
4
5

Suppose we want to know just what each of the values (1, 2, 3, 4 and 5) stand for?  The problem is that it depends on the question being asked as sometimes a 5 means “Strongly agree”, for other questions it  means “Critical” and for others it means “Extremely satisfied”.

Without having both numeric and text results we will have to write A LOT of IF / CASE statements and I, for one, do not want to do that.

So, now that we understand how and why we want the data “just so” we’ll see how to get it that way using Alteryx, Tableau 10.x, the Tableau-add-in for Excel, and Tableau 9.x.

 Posted by on October 13, 2015 2) Visualizing Survey Data, Blog Tagged with: , , , ,  5 Responses »
Jun 042015
 

Showing Differences between Periods and Statistical Significance in Tableau

Overview

Addressing this scenario has been the most popular request I’ve received over the past year. Here’s a summary what my clients and students have asked:

  • How do I show the change in Sales, Percentage of Promoters, Number of Visits, etc., between this month / quarter / year, and the previous month / quarter / year?
  • How do I make it easy to see which areas of the organization had an increase this period and which had a decrease?
  • How do I make it easy to see how much greater / less this period’s numbers are than the previous period?
  • How do I determine and show if this change is statistically significant? That is, how do I apply the stat test we like to use in our organization?
  • If the change is statistically significant, is it a one-time thing or should I start hyperventilating?

This is a LOT to take on and we won’t be able to fit all of it into a single visualization.

But we can fit it into a compact dashboard.

Important Ground Rules

In the example that follows I look at the percentage of people that responded with a “9” or “10” to a survey question. That is, I am only looking at the percentage of people that selected one of the top two boxes.  I am NOT trying to see if there is statistical significance or calculate the margin of error in the change in Net Promoter Score over time.

The concepts I explore are not just for survey data; I just happen to have some good longitudinal survey data that is well-suited for seeing how to build a stat test formula in Tableau.

I hope you will indulge me and accept that “the company stat guru” has a fine reason for applying a particular statistical test to the data we’ll be analyzing. That said, you should push back on “business-as-usual” assumptions to determine if what you are visualizing and testing really is important (this is the focus of the work Stacy Barr is doing with her Measure Up blog and is the foundation for Stephen Few’s most recent book Signal.)

So, with the assumption that the particular stat test we want to apply – or any stat test, for that matter – is warranted, how do you show it and how do you build it?

Let’s first explore the working dashboard then see how to build it with Tableau.

Note: A very heartfelt thanks to Kelly Martin,, Joe Mako, Vicki Reinhard, Susan Ferarri, and Tiffany Spaulding who helped vet the dashboard.  I went through many different approaches before settling on the one shown below.

A very special thanks to Jeffrey Shaffer who reviewed the blog post and asked some very good questions, and also to Helen Lindsay who provided sample data.

The data and what we want to show

The data below contains the first few rows of Net Promoter Score survey data with fields for date and role.

Figure 1 -- Net Promoter Score survey data with dates and roles

Figure 1 — Net Promoter Score survey data with dates and roles

For the dashboard I built I only focused on the percentage of people that were Promoters; that is, people who responded with a 9 or 10 when asked if they would recommend a product or service.

I decided to look at the data broken down by quarters as this particular data set didn’t lend itself to month over month comparison.  Note that the techniques we’ll see will work for any time period.

Here’s the top portion of the interactive dashboard.

1_SSDashboardTop

Figure 2 — Top portion of dashboard.  Notice that you can change the selected period, the confidence percentage, and filter by company.

Understanding the chart

Figure 3 -- The key features of the chart

Figure 3 — The key features of the chart

Let’s review what we can glean from the chart.  We can see

  • The percentage of promoters for a particular period and sort them by role, using a bar chart.
  • Which roles have a percentage of promoters that is greater than the previous period and which have less, using color to distinguish (blue for greater, brown for less).
  • Just how much more or less the percentage for this period is compared to the previous using a reference line (the bar is the current period; the vertical line is the previous period).
  • Which roles showed a significantly significant increase or decrease (the red dot).

Note that that the chart uses “Cotgreavian” tooltips that allow you to glean more detail for a particular role when you hover over a bar:

Figure 4 -- Hover over a bar for in-depth information about the role for the current period and the previous period

Figure 4 — Hover over a bar for in-depth information about the role for the current period and the previous period

So, we can see from the red dot that something is up with Lawyers, Doctors and Nurses; that is, the percent increase from the previous period for Doctors and Lawyers is statistically significant and the percent decrease for Nurses is also significantly significant.  Is this a one-time thing or a trend?

Looking at changes over time

Clicking a role or roles will display trends for that role / roles.  For example, if we select Nurse in the top chart a second chart showing percentage of promoters over time will appear, as shown here.

Figure 5 -- Percentage of nurses that are promoters, over time.

Figure 5 — Percentage of nurses that are promoters, over time.

The big takeaway for me is that up until the first quarter of 2013 there were very few responses and after that there was both a consistent number of responses along with a consist decline in the percentage of nurses that were promoters.

Should you be hyperventilating because of the four-month downward trend?  That discussion is beyond this blog post but I again encourage you to check out the work Stacy Barr is doing at her Measure Up blog as well as Stephen Few’s most recent book Signal.

How the This Quarter vs. That Quarter Chart is Built

Let’s dig into how to build this in Tableau, starting with the top viz in the dashboard.

Figure 6 -- What's under the hood.

Figure 6 — What’s under the hood.

  1. Promoters – Current Quarter. This is the measure that drives the bars.  It’s also driving what appears on the labels.
  2. Promoters – Previous Quarter. This measure is on the Level of Detail and drives the reference lines.
  3. Greater / Less. This is a discrete measure that determines the color of the bar.

Promoters – Current Quarter

What we want is the percentage of people that were promoters for the selected quarter, the “selected” quarter being determined by a parameter that the user can control.

Specially, we want to add up everybody that responded with a 9 or 10 for the selected quarter and divide by the total number of people that responded.  Here’s the calculation that handles this.

SUM(

IF [Value]>=9 and DATETRUNC(‘quarter’, [Select Period])==DATETRUNC(‘quarter’,[Date])
then 1 else 0
END)

/

SUM(

IF DATETRUNC(‘quarter’, [Select Period])==DATETRUNC(‘quarter’,[Date])
then 1 else 0
END)

The translation into English is

Take the sum of

If the value from a respondent is greater than or equal to 9 and the date value, truncated to the nearest quarter from the parameter drop down [Select Period] is the same as the date value, truncated to the nearest quarter for [Date], then 1, else 0.

Divide this by the sum of

If the date value, truncated to the nearest quarter for the selected period is the same as the date value, truncated for the nearest quarter for [Date], then 1, else 0.

Not sure about the [DATETRUNC] function vs. the [DATEPART] function?  Have a look at Joshua Milligan’s excellent post explaining date values vs. date parts.

Promoters – Previous Quarter

This calculation is very similar to the calculation for the Current Quarter, except we want to find results for the quarter that occurred just prior to the selected quarter.  Here’s the calculation.

SUM(

IF [Value]>=9 and DATETRUNC(‘quarter’, [Select Period])=DATETRUNC(‘quarter’,DATEADD(‘quarter’,1,[Date]))
then 1 else 0
END)

/

SUM(

IF DATETRUNC(‘quarter’, [Select Period])==DATETRUNC(‘quarter’,DATEADD(‘quarter’,1,[Date]))
then 1 else 0
END)

The formula is the same except we use the DATEADD function to add an additional quarter; that is, we’re saying that we only want to find results where, when we add an additional quarter, we get a value equal to the current quarter; i.e., the previous quarter, plus one quarter, gives us the current quarter.

Greater / Less

The color of the bars is determined by this discrete measure:

IF [Promoters — Current Quarter] > [Promoters — Previous Quarter] then “Greater than previous”
else “Less than previous”
END

Yes, I suppose we should have a contingency for when the percentage of promoters for the current period is the same as the previous period; I leave it as an exercise for the reader to add this functionality.

So, we’ve explained everything except … The Red Dot.

The Red Dot – Computing Statistical Significance on the Fly

Most of my clients and students are surprised to find out that you can fashion a test for statistical significance inside Tableau and it can test for statistical significance “on the fly”; e.g., you can apply filters and Tableau will recalculate based on the filter settings.

The first step is determining just how the client wants to test for statistical significance. This usually entails sending an inquiry to “the stats person” who responds with something that looks like this:

Figure 7 -- Z-test formula for statistical significance

Figure 7 — Z-test formula for statistical significance

I hope your eyes aren’t glassing over as this really isn’t very complicated; it just might look complicated if you’re not used to seeing stat formulas with square root symbols.  Here are the critical things you need to know:

p1            Percentage of promoters for the current period

p2            Percentage of promoters for the previous period

n1            Number of respondents for the current period

n2            Number of respondents for the previous period

If z1 is greater than or equal to 1.96 then there is a 95% degree of confidence that the difference between the two periods is statistically significant.

So, how do we build this formula?

Slowly, and in easy-to-digest pieces.

The Dot Itself

Figure 8 -- The discrete measure Z-Test Significance Dot is responsible for displaying the dot

Figure 8 — The discrete measure Z-Test Significance Dot is responsible for displaying the dot

The calculation that produces the dot is called Z-Test Significance Dot and it is defined as follows.

IF ABS([Promoters — Z-Score Quarter])>=[Confidence] THEN “•”
ELSE “”
END

This translates as

If the absolute value of [Promoters – Z-Score Quarter] is greater than or equal to the confidence parameter (currently set to 1.96, or 95%) then display a dot; otherwise, display a null string.

And just how is [Promoters – Z-Score Quarter] defined?  Let’s explore the next layer of the onion.

Promoters – Z-Score Quarter

This is defined as follows:

[Promoters — Z-Score Quarter Numerator] /

SQRT(

([Promoters — Z-Score Quarter Denom – Current] +
[Promoters — Z-Score Quarter Denom – Previous])
)

Here’s how it maps to the stat formula we saw earlier:

Figure 9 -- Mapping the components of the formula to different calculated field

Figure 9 — Mapping the components of the formula to different calculated field

So now we just need to understand the three different pieces that go into the stat function.

Promoters – Z-Score Quarter Numerator

This is very simple and refers to calculations we’ve already used.

[Promoters — Current Quarter] –
[Promoters — Previous Quarter]

Promoters — Z-Score Quarter Denom – Current

This is fairly straightforward given what we’ve already explored.

([Promoters — Current Quarter]*(1-[Promoters — Current Quarter]))
/SUM([Promoters — Current Quarter Count])

Where [Promoters – Current Quarter Count] is defined as follows.

IF DATETRUNC(‘quarter’, [Select Period])==DATETRUNC(‘quarter’,[Date])
THEN 1 END

So SUM(Promoters — Current Quarter Count]) is just adding up all the people that responded during the selected quarter.

Promoters — Z-Score Quarter Denom – Previous

([Promoters — Previous Quarter]*(1-[Promoters — Previous Quarter]))/
SUM([Promoters — Previous Quarter Count])

This uses the same logic as [Promoters – Z-Score Quarter Denom – Current] but instead aggregates results from the previous quarter.

Putting it all together

In addition to building the components in a piecemeal fashion I will often build a crosstab of all these components to see if they are working as I would expect.  Consider the crosstab shown here.

Figure 10 -- Crosstab showing all the pieces that contribute to the red dot

Figure 10 — Crosstab showing all the pieces that contribute to the red dot

The cross tab allows us to examine all the intermediate calculations to see how the contribute to the determining calculation in the last column.

What about the secondary chart?

So we’ve now seen how to build the top chart that shows current and previous quarters broken down by role.  How does the secondary chart – the chart that appears when you click a role or roles in the first chart – work?

Figure 11 -- Percentage of promoters for Nurses over time

Figure 11 — Percentage of promoters for Nurses over time

Here we have a dual axis chart so that we can have both a line (gray) and a circle (colored based on whether the change for the previous period is statistically significant).

In this case we have to construct all of the pieces using a table calculation, but the process of putting together the different components is identical to what we saw earlier.  For example, the calculation that determined the color of the circle, [LONG_Z-Test Significance], is defined as follows.

IF ABS([LONG_Z-Score])>=[Confidence] then “Significant”
else “Not significant”
end

And [LONG_Z-Score] is defined this way:

[LONG_Z-ScoreNumerator] /

SQRT(

([LONG_Z-Score Denom Current] +
[LONG_Z-Score Denom Previous])

)

I also built a crosstab to see how all the pieces fit together, as shown below.

Figure 12 -- Crosstab to help put together a z-test calculation for values shown over time

Figure 12 — Crosstab to help put together a z-test calculation for values shown over time

Conclusion

The dashboard in this blog post shows the percentage of promoters, sorted by role, for a particular quarter, compared with the percentage of promoters for the previous quarter.  Roles where the percentage difference is statistically significant are marked with a red dot. You can drill down on a particular role (or role) and see how scores have changed over time.

While the critical visual component was showing bars and reference lines, most of the “heavy lifting” went into determining if a change was statistically significant.  The key here was to not be intimidated by a statistical formula and to build the calculations in small pieces, using crosstabs to check the work.

 

May 112015
 

Much thanks to Susan Ferrari for exposing me to the concept of Net Promoter Score, Susan Baier for encouraging me to blog about it, and Helen Lindsey for providing anonymized NPS data.

Overview

My wife and I recently went out to a restaurant to celebrate our anniversary.  Accompanying the check was a survey card with three questions, one of which looked like this.

Figure 1 -- The classic Net Promoter Score question

Figure 1 — The classic Net Promoter Score question

We both agreed that the restaurant was very good, if not excellent, and that we would indeed recommend it to friends.  My wife suggested we circle the “8”.

I told her that if we were enthusiastic about recommending the restaurant we should give it a “9” as a 7 or 8 would be tabulated as a “neutral” or “passive” response.

She looked at me quizzically and asked why an “8” would be considered neutral.

I then explained how the Net Promoter Score works.

Understanding the Score

Respondents are presented with the question “Using a scale from 0 to 10, would you recommend this product / service to a friend or colleague?”

  • Anyone that responds with a 0 through 6 is considered a Detractor.
  • Anyone that responds with a 7 or 8 is considered a Passive (or Neutral).
  • Anyone that responds with a 9 or 10 is considered a Promoter.

The Net Promoter Score (NPS) is computed by taking the percentage of people that are Promoters, subtracting the percentage of people that at Detractors, and multiplying that number by 100.

How to compute NPS, courtesy B2B International.

Figure 2 — How to compute NPS, courtesy B2B International.

If you are like me (and my wife) you’re probably thinking that a “6” is a pretty good score and that it shouldn’t be bunched among the detractors.

I’m not going to get into a debate about NPS methodology and its usefulness, but I do want to show you some good ways to visualize NPS data.

The Problem with the Traditional Presentation

Consider this snippet of NPS survey data with responses about different companies from people in different roles.

Figure 3 -- Raw NPS data about different companies from people with different occupations.

Figure 3 — Raw NPS data about different companies from people with different occupations.

If we just focus on the NPS and not the components that comprise the NPS we can produce an easy-to-sort bar chart like the one shown here.

Figure 4 -- Traditional way to show NPS

Figure 4 — Traditional way to show NPS

Yes, it’s easy to see the company D has a much higher NPS than company H, but by not showing the individual components – and in particular the Neutrals / Passives –  we’re missing an important part of the story as the Neutrals / Passives are right on the cusp of becoming promoters.

For example, a Net Promoter Score of 40 can come from

  • 70% Promoters and 30% Detractors
  • 45% Promoters, 50% Passives, 5% Detractors

Same score, big difference in makeup.

An Alternative Approach to Displaying NPS Results

Consider the dashboard below which presents the data as a divergent stacked bar chart.

Figure 5 -- NPS dashboard with toggle to show percentages and score.

Figure 5 — NPS dashboard with toggle to show percentages and score.

The chart is easy to sort and you can also see that Company B and Company F have a relatively large group of Neutrals.

That said, being able to see the NPS score is very useful so the dashboard (see working version at the end of this post) has a toggle that switches between percentages and the score, as shown below.

Figure 6 -- Divergent stacked bar chart with NPS overlay.

Figure 6 — Divergent stacked bar chart with NPS overlay.

Note that the NPS divergent stacked bar chart is just a variation on a Likert scale divergent stacked bar chart.  You can find an explanation of how to build this type of visualization here.

What’s Next?

We now have what I think is a more insightful way to visualize Net Promoter Score data.

But clients and readers of my blog have asked me to address some of these questions as well:

  • How do you show the difference in NPS, or just the difference in percentage of promoters, between this quarter and the previous quarter?
  • If there is a difference, is the difference statistically significant?
  • What’s a good way to visualize and analyze NPS over time?

I will be addressing these issues in an upcoming post.  Stay tuned.

Mar 302015
 

Overview

Tableau 9.0 includes a built-in data prepping tool that makes reshaping survey data so it plays nicely with Tableau a much smoother experience than using the Tableau Excel add-in.  While this new feature won’t replace by trusty copy of Alteryx (for reasons that I explain later in this post) there are many occasions where Tableau’s new pivot feature will be more than adequate.

In this post I will walk through using the new pivot feature along with “temporary” blending to create a solid framework for using survey data with Tableau.

Special thanks to Susan Baier for bringing this to my attention and Jonathan Drummey for showing me Tableau’s Create Primary Group feature.

So, what do we have here?

Note: if you want to follow along you can download the Excel file here.

Consider an Excel workbook that contains two sheets.  The first sheet has the survey results, a snippet of which is shown here.

Figure 1 -- Some raw survey data

Figure 1 — Some raw survey data

Notice the format: one row for each survey respondent and a separate column for each question in the survey where each question is identified with a Question ID (e.g.,  Q0, Q1, Q2, Q134a, etc.).

Column A contains a unique ID for each survey taker, Columns B through D contain demographic information, and Column E contains a weight for each survey respondent.

The second sheet maps each Question ID to a human-readable version of the question and groups related questions into logical buckets.

Figure 2 -- Helper file that maps each Question ID to the wording of the question from the survey

Figure 2 — Helper file that maps each Question ID to the wording of the question from the survey

Note that when I first blogged about survey stuff I didn’t use a helper file but now I won’t take on a project without creating one as I don’t want to spend time aliasing hundreds of question IDs.  The Grouping column also makes is much easier to select related questions and visualize them together.

The data wants to be tall and thin

Anyone who has read up on the subject know that life with survey data and Tableau is a lot easier when the data is reshaped so let’s see how to do this with Tableau 9.0.

  1. In Tableau, connect to the data source and the sheet that contains the data you want to reshape and visualize. This is what it looks like on my screen.

    Figure 3 -- Survey data prior to pivoting (reshaping)

    Figure 3 — Survey data prior to pivoting (reshaping)

  2. Select the fields you want to merge / pivot / reshape, in this case everything except the Resp ID, demographic fields, and Weight field.
  3. Click in any of the highlighted fields and select Pivot. Tableau will combine the 20+ fields into two fields, as shown here.

    Figure 4 -- Data after it has been pivoted

    Figure 4 — Data after it has been pivoted

  4. Rename the first field Question ID and the second field Value.

    Figure 5 – Pivoted fields renamed

    Figure 5 – Pivoted fields renamed

  5. Indicate whether you want an extract (a good idea when Excel is the data source) and go to the Tableau worksheet.
  6. Drag Question ID onto the rows shelf. Your screen should look like this.

    Figure 6 -- Reshaped data in Tableau.  Instead of 20 measures for each question we have only one measure.

    Figure 6 — Reshaped data in Tableau. Instead of 20 measures for each question we have only one measure.

Creating the temporary blend

Now we need to connect and relate the Helper file to our pivoted survey data.  We will do this with a blend, but then use a very slick feature of data blending that will allow us to ditch the secondary data source. Here are the steps.

  1. Click the Add New Data source tool.8_dataSource
  2. Connect to the Helper File sheet from the same Excel workbook and indicate whether or not you want to create an Extract (of course you do!)

    Figure 7 -- The secondary data source

    Figure 7 — The secondary data source

  3. Return to the Tableau worksheet.
  4. Drag the Grouping field to the left of Question ID on the rows shelf, and Wording to the right, as shown below.  Note that you don’t *have* to do this but it’s always useful to see if the hierarchy is working correctly.

    Figure 8 -- Blended Data

    Figure 8 — Blended Data

  5. Right-click the Grouping pill on the Rows shelf and select Create Primary Group.
  6. Rename the group Grouping as shown below.

    Figure 9 -- Leveraging the blend to create an ad-hoc group based on Question ID fields.

    Figure 9 — Leveraging the blend to create an ad-hoc group based on Question ID fields.

  7. Click OK.
  8. Right-click the Wording pill on the Rows shelf and select Create Primary Group.
  9. Rename the group Wording and click OK.
  10. Click the primary data source (the one from which we initially selected Question ID). Notice the two groups that Tableau generated for us.

    Figure 10 -- Tableau-generated groups

    Figure 10 — Tableau-generated groups

At this point we no longer need the secondary data source as the primary source now has groups that map and alias the Question IDs.  Very slick.

Seeing this in action

Now that we have the groups it’s easy for us to do some very quick analysis.  For example, let’s suppose we want to see the average Likert scale score for the collection of Likert scale questions.

  1. Create a new worksheet.
  2. Drag Grouping into the Filters shelf and select the collection of questions you want to view, in this case Likert Set 1.

    Figure 11 -- The Grouping group makes is easy to indicate which sets of related questions you want to examine.

    Figure 11 — The Grouping group makes is easy to indicate which sets of related questions you want to examine.

  3. Drag Wording to Rows.
  4. Right-Drag Value to Columns and select AVG(Value).
  5. Sort in descending order.

Isn’t this great?  We didn’t have to go groping around for the right Question IDs and we didn’t have to alias anything.

So, are there any shortcomings?  Is this blend approach as good as being able to join the pivoted data with the helper file?

Yes, there are shortcomings

There are several things that a join will give us that we can’t get with a blend.

You cannot refer to the group in a calculated field

You can’t refer to a group in a calculated field, so something like this won’t be available:

IF [Grouping] =”Things you Measure” then [Value] END

You need to update the group members if you add new questions

Tableau’s generation of the primary group is much like populating the members of a parameter with the members of a field.  Tableau will do it when you click a button, as it were, but it won’t update the list automatically.

If you end up adding new questions to your survey or reorganizing how questions are categorized in your helper file will either need to regenerate the primary data source groups or manually edit them.

You cannot combine text results with numeric results

This is one of the major “gotchas” for me, at least for larger surveys.  With most commercial survey systems you can download the data in a label format or a numeric format.  For example, when downloaded as labels survey responses might look like this:

Strongly disagree
Disagree
Neutral
Agree
Strongly agree

When downloaded as numbers the same responses would look like this:

1
2
3
4
5

I find I like to have both label and numeric responses, so I pivot / reshape both sets of data and then join them together using Question ID and Response ID.  Using Alteryx I can perform the join but I cannot do it with Tableau 9 and pivoted data.

Conclusion

For complex surveys where I need to do a fair amount of data cleanup and need both next and numeric values I’ll continue to use Alteryx.  For shorter surveys where I don’t need to do a lot of prep work and where either labels or numeric values will suffice, Tableau 9.0’s new pivot feature suits me just fine.  It’s a great addition to a great product.

 Posted by on March 30, 2015 2) Visualizing Survey Data, Blog Tagged with: , , ,  29 Responses »
Mar 112015
 

Overview

Note: I based my Tableau Conference 2015 presentation on this blog post. You can download a PDF of the presentation and the Tableau packaged workbook.  Click here to see a video of the presentation.

Earlier this year one of my clients was updating a collection of survey data dashboards and they wanted to revisit the way they presented demographic data.  They thought that the collection of bar charts comprising the demographics dashboard was boring and wanted to replace them with something that was a bit more visually arresting.  In particular they wanted to take something that looked like this this…

Figure 1 -- a "boring" collection of bar charts.

Figure 1 — a “boring” collection of bar charts.

… and replace it with something that looks like this:

Figure 2 -- A "flashy" demographics dashboard

Figure 2 — A “flashy” demographics dashboard

When asked why they wanted something “flashier” they indicated a desire to draw the viewer into the dashboard and they thought a dashboard with more than just bar charts would do the trick.

I wondered “why stop there?”  Why not add pictures of kittens and puppies?

Figure 2a -- the Too Cute dashboard.

Figure 2a — the Too Cute dashboard.

The real issue here is that the underlying data just isn’t interesting and adding sexy visual elements will do nothing to make the data more interesting.  There’s only one way I know to make this kind of data “interesting”.

Make it personal.

Tapestry and Chad Skelton

I recently attended the 2015 Tapestry Conference where Chad Skelton of the Vancouver Sun presented a great session making the case that people are ravenous for data about themselves.

I was particularly taken with an interactive dashboard Chad created that allows Canadians to see how much older / younger they were than other Canadians.

I decided I would look at United States census data and build a similar dashboard.

US Census Data without Personalization

Here’s a histogram showing the relationship between age and US population.

Figure 3 -- A histogram showing the relationship between age and US population.

Figure 3 — A histogram showing the relationship between age and US population.

I have to admit this doesn’t do much for me although I do find the long downward slope from around the age of 50 somewhat interesting (but I am a bit of a data geek).

Contrast this general purpose graphic with the personalized dashboard shown below.

Did you try it?  Are you over 38 years old?  If “yes,” were you a bit depressed?

I certainly was.

While I don’t mean to depress anyone I do want to underscore how much more interesting the data is when the data is about YOU.

Make the Demographics Dashboard Interesting – Make it Personal

With the goal of personalization in mind let’s see how we can make the dashboard in Figure 1 more interesting.

Let’s start by gathering some information about the person viewing the dashboard; that is, let’s present some parameters from which the viewer can apply personalized settings:

Figure 4 -- Get your user to tell you something about himself / herself.

Figure 4 — Get your user to tell you something about himself / herself.

Now we can take these parameter settings and highlight them in the dashboard.

Figure 5 -- A "personalized" demographic dashboard.

Figure 5 — A “personalized” demographic dashboard.

We can then go one step further and invite the viewer to select the colored bars to see exactly how many people that took the survey have the same demographic background as the person interacting with the dashboard.

6_boring

Figure 6 — There are 65 people who fall into the same demographic pool as the person viewing the dashboard.

Conclusion

I’ve become a big advocate for adding personalization to dashboards and a number of my clients have started to adopt the approach.  I’ve seen some very good results at Bersin by Deloitte where Bersin is leveraging their proprietary survey data by allowing individual organizations to benchmark their numbers against similar organizations.

Note: A few months ago Joe Mako sent me a link to a Stephen Few blog post.  In researching this topic I revisited the post and see that Chad Skelton was in fact featured in Few’s essay . It seems that Skelton did not just “happen” upon the idea of personalization but was grappling, like so many of us, with ways to entice people to engage with visualizations.

For the record, I think personalized bar charts beat packed bubbles any day of the week.

Sep 162014
 

Finally, a good use for packed bubbles!

The Problem

I recently received a query from a client on how to compare responses to one question with responses to another question when both questions have possible LIkert values of 1, 2, 3, 4, and 5.  That is, if you have a collection of questions like this:

01_LIkert

How would you show response clusters when you compare “Good Job Skills” against “Likes the Beatles”?

This question is particularly applicable if you are a provider of goods and services and you want to see if there is alignment or misalignment between “how important is this feature” and “how satisfied are you with this feature”.

Note: There’s a Tableau forum thread that has been looking into this issue as well.  Please see http://community.tableausoftware.com/thread/137719.

So, how can we fashion something that helps us understand the data?

Before we get into the nitty gritty here’s a screen shot of one of the approaches I favor.  Have a look to determine if reading the rest of the blog post is worth the effort.

02_PreviewResults

Still reading?  Well, I guess it’s worth the effort.

The Traditional Scatterplot Approach

Consider the set up below where we see how Tableau would present the Likert vs. Likert results in a standard scatterplot.

03_TradScatterPlot

So, what is going on here?

There are a total of nine Likert questions available from the X-Question and Y-Question parameter drop down list boxes.  Our desire here is to allow us to compare any two of the nine at any time.

The “meat” of the visualization comes from the SUM(X-Value) on the columns shelf and SUM(Y-Values) on the rows shelf where X-Value and Y-Value are both defined as

IF [Wording]=[X-Question] then [Value]+1 END

This translates into “if the selected item from the list is the same as one of the questions you want to analyze, use the [Value] for that question”. Note that [Wording] is the same as [QuestionID] but with human readable values (e.g., “Likes the Beatles” instead of “Q52”)

We use [Value]+1 is because the Likert values are set to go from 0 to 4 instead of 1 to 5, and most people expect 1 to 5.

We can use SUM(X-Value) and SUM(Y-Value) because we have Resp ID on the Details shelf.  This forces Tableau to draw a circle for every respondent.  The problem is that we have overlapping circles and even with transparency you don’t get a sense of where responses cluster. Yes, it is possible with a table calculation to change the size of the circle based on count but we’ll I’ll provide what I think is a better approach below.

A note about the filters: The Question filter is there to constrain our view so that we only concern ourselves with Likert Scale questions.  It isn’t necessary but is useful should we be experimenting with different approaches.  The SUM(X-Value) and SUM(Y-Value) filters remove nulls from the view.

Packed Bubbles to the Rescue

I’m not a big fan of packed bubbles (see this post) but for this situation we can use them and get some great results, as shown below.

04_BubbleScatterPlot

I’ve made a couple of changes to the traditional scatterplot visualization the most important being SUM(X-Value) and SUM(Y-Value) are now discrete and we get a trellised visualization instead of a continuous axis.  Note that I had to change the sort order of the Y-axis elements so that they appear in reverse order (5 down to 1).

I got the packed bubbles by placing CNTD(Resp ID) on the size button. This assures that each bubble is the same size and triggers Tableau’s packing algorithm.

Note that I also added an on-demand “Drill down” so that you can color the circles by different demographic dimensions.

I’ve experimented with this with some large data sets and Tableau does a great job with packing the bubbles intelligently.

What About Trend Lines?

Since we are using discrete measures on the rows and columns shelves we cannot produce trend lines.  When I first started this project I experimented with more traditional jittering and was able, with a fair amount of fuss and bother, to produce this.

05_TrendLineExample

A special thanks to Jeffrey Shaffer who provided a link on how to create pseudo-random numbers in Tableau (thank you, Josh Milligan).

I prefer the example that doesn’t require the jittering, but if you need to trend lines or if you prefer the jittered look I’ve included the example in the downloaded packaged workbook (see below).

It also occurred to me that the trend line would be based on the jittered values and not the actual values.  The same workbook contains a “home grown” trend line based on the actual values (courtesy of Joe Mako). It turns out the jittered trend line is almost identical to the non-jittered trend line so I suspect you won’t need to take the “home grown” approach.


Update

I received a number of comments here and on LinkedIn about the “drill down / break down” capability and that it is hard to see the percentage of dots by category.  For example, if you break down by generation do the dots for one generation cluster more in one part of the trellis than in others?

I thought that in this case having a different-colored bubble per category where the size of the bubble was proportionate to the percentage of responses within that category made sense.

Size by Category

I thought building this would be easy, but I needed to call in the heavy artillery (Joe Mako).

I’ll blog about the solution later. In the meantime the packaged workbook below contains this additional approach.

Sep 032014
 

Overview

In my experience the number one impediment to success with Tableau is getting data in a format that plays nicely with Tableau. Alteryx is a combination ETL (extract, transform, load), geospatial, and statistical modeling solution that just may solve this “getting-the-data-right” problem.

And it plays very nicely with Tableau.

In this blog I will recount some experiences I’ve had with Alteryx and some thoughts on what the future might hold for the two companies.

Client One

In April of 2013 I was working with one of my favorite clients and we ran into a roadblock in that they needed to blend a lot of data from disparate sources and this confluence of data was an absolute monstrosity. I was on the precipice of recommending that they hire a data warehousing consultant when I happened to attend a Tableau road show in New York City. Alteryx sponsored the lunch that day and I paid attention to their presentation.

Fast-forwarding a bit, I received a very compelling demo and product roadmap from Dean Stoecker, Alteryx’s CEO.

I visited my client the next day and told them to hold off on hiring the consultant and look into Alteryx as a better short, mid, and long term solution.  It proved to be a good recommendation as the client is now blending data from a lot of sources and gleaning insights that would have taken much longer had we gone the consultant /consolidated warehouse route.

Client Two

About six months ago I was working with another client that was swimming in data, but that data was missing some key elements.  The client was tracking hundreds of service calls throughout the New York metropolitan area and although they had street addresses for every incident they were only able to produce a map that showed results at the county level, and this didn’t reveal very much.

I asked them to look into seeing if the incidents clustered in particular neighborhoods.  For this we would need latitude and longitude coordinates for each street address.

Six weeks later the client triumphantly called and told me their IT department had finally added zip code information to the database query they used to drive the visualization.  I sighed and politely told them that while having zip code-level data was better than county-level data, zip codes would not give us the granularity we needed.

At this point I asked them to send me a copy of the data and, armed with Alteryx and the Tom Tom US maps, I generated latitude and longitude for 99% of the addresses.

And I did this in about 15 minutes.

The next day I presented the client with a symbol map that contained a different color-coded circle for every incident in the database.  I wish I could tell you that we discovered something truly amazing once we had this (we didn’t) but the critical point is that tools like Alteryx and Tableau allowed us to pursue a hunch in a matter of minutes. The next hunch might yield an incredible insight and with Alteryx and Tableau we can investigate these notions without having to tax an already over-burdened IT department.

Client Three – Me

Any followers of this blog know that I do a lot of work with survey data and to get Tableau to do what it does so well survey data as downloaded from a survey tool needs to be parsed, pummeled, and browbeaten into submission.

For years I’ve been using either Tableau’s free Excel add-in (when the data is in that format) or relying on the kindness of DBAs to render the data in the format I need.

The problem with the Excel approach is that it requires a lot of hand manipulation and if the client decides they want to include responses after they have sworn the survey is closed, well, I end up having to go through the whole error-prone process a second or third time.

Enter Alteryx, which allows me to set up the process once, automate it, and then run it whenever I need.    The icing on the cake is that it generates a ready-to-use Tableau .TDE file. In addition to the process being faster and safer, I can start visualizing survey data way before the survey is closed. This has been a huge time saver for me and I will never go back to hand-massaging the data. Plus, if the data source is a database (vs. downloaded files) I can apply the same tool and the same process without needing the DBA to fashion anything special for me.

Will Tableau Acquire Alteryx?

Given that Alteryx fills a gap that currently exists with Tableau and that the two products play so nicely together, in September 2013 I predicted that within a year Tableau would acquire Alteryx.

So I was wrong.  But will it happen down the road?  I do like how the two tools work together but there are some things about Alteryx that Tableau users may find off-putting, including:

Alteryx is a “heavy lunch”

Alteryx is an ETL, geospatial, predictive modeling, breath mint, candy mint, floor wax, all-in-one tool.  This cornucopia of options can be intimidating.

Alteryx assumes more knowledge

Alteryx assumes a greater level of programming sophistication than does Tableau. For example, Alteryx makes a distinction between equivalence and an assignment. In Alteryx you would write

X=7

To assign the value 7 to the variable X.  But if you were performing a comparison in an IF statement you would write

IF X==7 Then [what to do] ENDIF

Tableau does not make the distinction between the single and double equal sign.  Granted, if you attempt to use one equal sign in Alteryx you’ll get an error message with the suggestion that you should use the double equal sign, but there appears to be an assumption that the user comes into Alteryx with an appreciation for standard programming syntax.

It’s Easier to Break Things in Alteryx

If you change the name of a field in Tableau everything that refers to that field also changes.  This is not the case with Alteryx and modifying your Alteryx module to address this field name change can be a pain.

Indeed, I think this example epitomizes the difference in refinement between the two tools.  Don’t get me wrong, Alteryx is a terrific tool and I am very happy to have it in my quiver, but there is a higher degree of user affordance in Tableau, and users accustomed to Tableau’s luxury car feel may find Alteryx a bit of a bumpy ride.

So, while I don’t see Tableau acquiring the Alteryx and supporting the tool in its current form, who knows what will happen as both tools and companies mature?

So, Should You Buy Alteryx?

I can’t answer this question but you should at least download an evaluation copy and try it out.  I will tell you that for my survey data practice the product has been a godsend.