May 102017
 

May 10, 2017

Overview

Most organizations want to wildly exceed customer expectations for all facets of all their products and services, but if your organization is like most, you’re not going to be able to do this. Therefore, how should you allocate money and resources?

First, make sure you are not putting time and attention into things that aren’t important to your customers and make sure you satisfy customers with the things that are important.

One way to do this is to create a survey that contains two parallel sets of questions that ask customers to indicate the importance of certain features / services with how satisfied they are with those products and services.  A snippet of what this might look like to a survey taker is shown in Figure 1.

Figure 1 -- How the importance vs. satisfaction questions might appear to the person taking the survey.

Figure 1 — How the importance vs. satisfaction questions might appear to the person taking the survey.

How to Visualize the Results

I’ve come up with a half dozen ways to show the results and will share three approaches in this blog post.  All three approaches use the concept of “Top 2 Boxes” where we compare the percentage of people who indicated Important or Very Important (the top two possible choices out of five for importance) and Satisfied or Very Satisfied (again, the top two choices for Satisfaction).

Bar-In-Bar Chart

Figure 2 shows a bar-in-bar chart, sorted by the items that are most important.

Figure 2 -- Bar-in-bar chart

Figure 2 — Bar-in-bar chart

This works fine, as would having a bar and a vertical reference line.

It’s easy to see that we are disappointing our customers in everything except the least important category and that the gap between importance and satisfaction is particular pronounced in Ability to Customer UI (we’re not doing so well in Response Time, 24-7 Support, and East of Use, either.)

Scatterplot with 45-degree line

Figure 3 shows a scatterplot that compares the percent top 2 boxes for Importance plotted against the percent top 2 boxes for Satisfaction where each mark is a different attribute in our study.

Figure 3 -- Scatterplot with 45-degree reference line

Figure 3 — Scatterplot with 45-degree reference line

The goal is to be as close to the 45-degree line as possible in that you want to match satisfaction with importance. That is, you don’t want to underserve customers (have marks below the line) but you probably don’t want to overserve, either, as marks above the line suggest you may be putting to many resources into things that are not that important to your customers.

As with the previous example it’s easy to see the one place where we are exceeding expectations and the three places where we’re quite a bit behind.

Dot Plot with Line

Of the half dozen or so approaches the one I like most is the connected dot plot, shown in Figure 4.

Figure 4 -- Connected dot plot. This is the viz I like the most.

Figure 4 — Connected dot plot. This is the viz I like the most.

(I placed “I like most” in italics because all the visualizations I’ve shown “work” and one of them might resonate more with your audience than this one.  Just because I like it doesn’t mean it will be the best for your organization so get feedback before deploying.)

In the connected dot plot the dots show the top 2 boxes for importance compared to the top 2 boxes for satisfaction.  The line between them underscores the gap.

I like this viz because it is sortable and easy to see where the gaps are most pronounced.

But what about a Divergent Stacked Bar Chart?

Yes, this is my “go to” viz for Likert-scale things and I do in fact incorporate such a view in the drill-down dashboard found at the end of this blog post. I did in fact experiment with the view but found that while it worked for comparing one feature at a time it was difficult to understand when comparing all 10 features (See Figure 5.)

Figure 5 -- Divergent stacked bar overload (too much of a good thing).

Figure 5 — Divergent stacked bar overload (too much of a good thing).

How to Build This — Make Sure the Data is Set Up Correctly

As with everything survey related, it’s critical that the data be set up properly. In this case for each Question ID we have something that maps that ID to a human readable question / feature and groups related questions together, as shown in Figure 6.

Figure 6 -- Mapping the question IDs to human readable form and grouping related questions

Figure 6 — Mapping the question IDs to human readable form and grouping related questions

Having the data set up “just so” allows us to quickly build a useful, albeit hard to parse, comparison of Importance vs. Satisfaction, as shown in Figure 7.

Figure 7 -- Quick and dirty comparison of importance vs. satisfaction.

Figure 7 — Quick and dirty comparison of importance vs. satisfaction.

Here we are just showing the questions that pertain to Importance and Satisfaction (1). Note that measure [Percentage Top 2 Boxes] that is on Columns (2) is defined as follows.

Figure 8 -- Calculated field for determining the percentage of people that selected the top 2 boxes.

Figure 8 — Calculated field for determining the percentage of people that selected the top 2 boxes.

Why >=3?  It turns out that the Likert scale for this data went from 0 to 4, so here we just want to add up everyone who selected a 3 or a 4.

Not Quite Ready to Rock and Roll

This calculated field will work for many of the visualizations we might want to create, but it won’t work for the scatterplot and it will give us some headaches when we attempt to add some discrete measures to the header that surrounds our chart (the % Diff text that appears to the left of the dot plot in Figure 4.) So, instead of having a single calculation I created two separate calculations to compute % top 2 boxes Importance and % top 2 boxes Satisfaction. The calculation for Importance is shown in Figure 9.

Figure 9 -- Calculated field for determining the percentage of folks that selected the top two boxes for Importance.

Figure 9 — Calculated field for determining the percentage of folks that selected the top two boxes for Importance.

Notice that we have all the rows associated with both the Importance questions and Satisfaction “in play”, as it were, but we’re only tabulating results for the Importance questions so we’re dividing by half of the total number of records.

We’ll need to create a similar calculated field for the Satisfaction questions.

Ready to Rock and Roll

Understanding the Dot Plot

Figure 10 shows what drives the Dot Plot (we’ll add the connecting line in a moment.)

Figure 10 -- Dissecting the Dot Plot.

Figure 10 — Dissecting the Dot Plot.

Here we see that we have a Shape chart (1) that will display two different Measure Values (2) and that Measure Names (3) is controlling Shape and Color.

Creating the Connecting Line Chart

Figure 11 shows how the Line chart that connects the shapes are built.

Figure 11 -- Dissecting the Line chart

Figure 11 — Dissecting the Line chart.

Notice that Measure Values is on Rows a second time (1) but the second instance the mark type is a Line (2) and that the end points are connected using the Measure Names on the Path (3).  Also notice that there is no longer anything controlling the Color as we want a line that is only one color.

Combining the Two Charts

The only thing we need to do now is combine the two charts into one by making a dual axis chart, to synchronize the secondary axis, and hide the secondary header (Figure 12.)

Figure 12 -- the Completed connected Dot Plot.

Figure 12 — the Completed connected Dot Plot.

What to Look for in the Dashboard

Any chart that answers a question usually fosters more questions. Consider the really big gap in Ability to Customize UI. Did all respondents indicate this, or only some?

And if one group was considerably more pronounced than others, what were the actual responses across the board (vs. just looking at the percent top 2 boxes)?

Figure 13 -- Getting the details on how one group responded

Figure 13 — Getting the details on how one group responded

The dashboard embedded below shows how you can answer these questions.

Got another approach that you think works better?  Let me know.

Oct 172016
 

Overview

So, you’ve created a wonderful collection of survey data dashboards that have innumerable demographic filters so that users can, for example, just see responses from left-handed Los Angeles Lakers fans between the ages of 34 and 39.

So, what’s the problem?

Actually, there are two problems. The first occurs when extreme filtering reduces the number of responses so much that the results are statistically meaningless. The second is that you may inadvertently allow people to “glean” who has answered a survey.  For example, if you conduct a salary survey you want participants to be assured that nobody will be able to see individual responses. But if you have too many filters it may be possible to winnow down the results so you can guess who provided the answer.

Fortunately, it’s easy to set up a graceful way to suppress a chart and display an error message in its place when the “n” count gets too low.

How it works

Consider the dashboard shown in Figure 1.  Notice that the upper right corner shows that with nothing filtered there are a total of 350 responses.

01_checkall

Figure 1 — A simple dashboard showing results for a check-all-that-apply question.

Also notice there’s a parameter control that allows you to specify the cut-off point for displaying the visualization.

Now let’s see what happens if we use the filters to winnow down the number of responses to the point that there are fewer than 20 (Figure 2.)

Figure 2 -- With too few responses the bar chart is suppressed and the warning message is displayed.

Figure 2 — With too few responses the bar chart is suppressed and the warning message is displayed.

What’s happening here is that there are two floating charts both with similar filters that looks at how many survey responses there are. The bar chart in Figure 1 is set to appear if the number of responses is greater than or equal to 20. The warning message (it’s just a Tableau worksheet) appears if the number of responses is fewer than 20.

How the filters work

Let’s look first at what drives the bar chart (Figure 3.)

Figure 3 -- Pill settings and filters for the bar chart.

Figure 3 — Pill settings and filters for the bar chart.

Notice in particular there is a field called [Minimum Count] that is on Filters card and that it is set to True. The field [Minimum Count] is defined as follows:

Figure 4 -- How [Minimum Count] is defined.

Figure 4 — How [Minimum Count] is defined.

Here [Count Threshold] is the fill-in-the-blank parameter (currently set to 20).

So, the visualization will only appear if there are at least 20 responses; otherwise the filter “kills” the viz and the only thing we see is the title.

Now, how does the secondary visualization work?  Let’s have a look (Figure 5.)

Figure 5 -- Pill and filter settings for the "warning" visualization.

Figure 5 — Pill and filter settings for the “warning” visualization.

Notice that [Minimum Count] is also on the Filter card but is set to False. We’re seeing the viz (the red message) because the filters in place result in fewer than 20 responses.

And just what is producing the message? It’s the field [Too Few] that’s been placed on the Text button on the Marks card. The field is defined as follows.

Figure 6 -- Definition of the field [Too Few].

Figure 6 — Definition of the field [Too Few].

Why use floating elements?

We certainly could cram the two visualizations into a container and make sure that when one is displayed the other only takes up a few pixels.  I elected to go with the floating approach but made sure that the secondary viz was set to be in back of the primary viz.

Is that all there is to it?

For a check-all-that-apply question, that’s all you need to know, but some question and visualization types may need different approaches.

Consider Figure 7 where we see a jitterplot comparing salary data for men and women where each dot represents a response from an individual survey participant.

Figure 7 -- Pill and filter configuration for a jitterplot visualizing salary data.

Figure 7 — Pill and filter configuration for a jitterplot visualizing salary data.

Notice that we have a different field (one that uses a Table calculation) on the Filters card.

There’s also a very different setup to display the warning message for this visualization as we cannot simply base this on SUM([Number of Records]).

I will leave it to the reader to explore how this these are set up.  Just let me know when you come up with a better approach.

Conclusion

If you’re visualizing survey data and giving users filters you should come up with a game plan for what to do if there are too few responses. In this blog post (and the embedded, downloadable workbook) I present two approaches for two types of survey questions. Other questions types (for example, Likert-scale questions) will need some modifications to what I’ve presented here.

Oct 132015
 

Overview

You can download a trial version of Alteryx here.

You can find the Excel source data here.

You can download the completed Alteryx module from here.

Understanding the workflow

Here’s the Alteryx workflow that will get the data in the format we want.

Figure 9 -- Alteryx workflow for getting the survey data in an optimal format for analysis in Tableau

Figure 9 — Alteryx workflow for getting the survey data in an optimal format for analysis in Tableau

If this is the first time you’ve seen an Alteryx workflow it may look a little complicated (shades of Rube Goldberg) but think about how complicated the flow and decision tree is for what you do when you get up in the morning and get ready for work – it’s way more complicated.

Let’s start by focusing on the three data sources that we need (labeled 1, 2, and 3 in the diagram above).

  1. Data with numeric responses (we’ll use this for our demographic data, too)
  2. Data with text responses
  3. The Question Helper file (contains our meta data).

A note about SPSS files

For this example we’re using Excel data but the process would be identical for survey data saved in an SPSS format (.SAV file).  SPSS encodes both the text and numeric data in a single file, but you need to input values from that same data source twice – once for text and once for numeric values.

When you specify an SPSS file as a data source you’ll see the following check box option.

Figure 10 -- Options that appear when you bring in data from an SPSS file.

Figure 10 — Options that appear when you bring in data from an SPSS file.

Selecting “Expand Factors” will bring in the data as labels.  Unchecking this box will bring in the data as numbers.

Demographics

Figure 11 -- Selecting the demographic components from the data source

Figure 11 — Selecting the demographic components from the data source

Clicking the Select tool reveals the following settings.

Figure 12 -- Specifying which fields we want to keep

Figure 12 — Specifying which fields we want to keep

A translation into English would be “open this Excel file and select these fields”.  Here we keep RespID, Gender, Location, Generation, and Weight.

Numeric values

Figure 13 -- Numeric values section of the Alteryx workflow with the Reshape tool selected

Figure 13 — Numeric values section of the Alteryx workflow with the Transpose tool selected

Using the same data source, selecting the Transpose tool reveals the following settings.

Figure 14 – Transpose tool settings

Figure 14 – Transpose tool settings

Here we indicate that we want to keep RespID and pivot all the other checked fields.  Notice that the demographic components are not checked.

Looking back to the workflow diagram, we next follow this pivot with a Select tool which we use to rename the pivoted fields as shown here.

Figure 15 -- The Select tool allows us to name the pivoted fields

Figure 15 — The Select tool allows us to name the pivoted fields

Here we are naming the first pivoted field Question ID and the second pivoted field Numeric value.

Dealing with nulls

Figure 16 -- The Filter tool allows us to eliminate null values before they hit Tableau.

Figure 16 — The Filter tool allows us to eliminate null values before they hit Tableau.

While not critical, Alteryx allows us to eliminate null values before they come into Tableau.  Here’s the setting for this tool.

Figure 17 -- Setting to eliminate nulls from the Numeric value field

Figure 17 — Setting to eliminate nulls from the Numeric value field

Joining the demographic with the transposed (pivoted) numeric data

Figure 18 -- Joining the demographic and transposed numeric data

Figure 18 — Joining the demographic and transposed numeric data

We’re now ready to join the demographic data with the pivoted numeric survey data. We do this with Alteryx’s Join tool.  Here are the settings.

Figure 19 -- Joining demographics and numeric data using the common field, RespID. Notice that we don't include the redundant version of RespID that is coming from the right table.

Figure 19 — Joining demographics and numeric data using the common field, RespID. Notice that we don’t include the redundant version of RespID that is coming from the right table.

Checking our progress

If we add a Browse tool after the Join we’ll see the following results.

Figure 20 -- Results of the initial Join

Figure 20 — Results of the initial Join

Note that we could, in fact, get by with using just this in Tableau, but the whole point is to make things faster and easier, so we’re going to add text responses and the “Helper” data.

Text Values

Figure 21 -- Workflow for setting the text values

Figure 21 — Workflow for setting the text values

This is identical to the numeric values except that we specify a different data source (one with label responses instead of numeric) and we rename the second pivoted field differently, as shown here.

Figure 22 -- The Select tool allows us to name the pivoted fields

Figure 22 — The Select tool allows us to name the pivoted fields

Here we are naming the first pivoted field Question ID and the second Text value.

Tying Demographic, Numeric and Text Values Together

We need to combine the demographic / numeric values with the text values.  We will again use Alteryx’s Join tool, but we’ll need to join on both RespID and Question ID.

Figure 23 -- Joining demographic, numeric, and text data together

Figure 23 — Joining demographic, numeric, and text data together

The setting for the Join is shown below.

Figure 24 -- Join settings for combining demographics / numeric with text data

Figure 24 — Join settings for combining demographics / numeric with text data

Adding the question meta data and outputting the results

We’re almost done; we just need to add the information that tells us about each Question ID so we can map the ID to its human-readable form, group related IDs together, and so on.

Figure 25 -- Joining the "Helper" file data

Figure 25 — Joining the “Helper” file data

The setting for this Join is shown below.

Figure 26 -- Marrying the question meta data to the demographic and survey data

Figure 26 — Marrying the question meta data to the demographic and survey data

The only thing left to do is output this to a new data source.  In the sample workflow I output to an Excel file but you can just as easily write directly to a Tableau Data Extract file.

But Wait!  There’s More!

I intentionally used data that was well behaved, but what happens if the data is “messy”?  For example, suppose there’s an extra row below the field names?  Or suppose that for a check-all-that-apply question, instead of 1s and 0s we instead have 1s and blanks (which would mess things up in Tableau)?  Alteryx handles both of these scenarios (and dozens more) easily.

Is there a Downside?

If there’s any downside it would be the cost as Alteryx starts at around $4,000 per year.  If all you are doing is some occasional survey data reshaping then this fee would probably be prohibitively expensive. If, however, you need to corral a lot of data – or if you need advanced data blending, geospatial analysis, or predictive modeling – then Alteryx is a huge bargain.

 Posted by on October 13, 2015 2) Visualizing Survey Data, Blog Tagged with: , , , , ,  5 Responses »
Oct 132015
 

Overview

In writing about visualizing survey data using Tableau I’ve found that the number one impediment to success is getting the data in the right format. In accompanying posts I’ll explain how to get this done using Alteryx, Tableau 10.x, the Tableau Excel add-in, and Tableau 9.0 pivot feature (you can come close with 9.x, but can’t get it perfect).

What do I mean by “just so”?

When I deal with survey data there are usually four different elements that need to fit together:

  1. The demographic information (e.g., age of respondents, gender, etc.)
  2. Survey responses in text format
  3. Survey responses in numeric format
  4. Meta data that describes the survey data.

Let’s see what the four elements look like using an Excel sample data set (click here to download).

Demographic data

Here’s what the demographic data looks like.

Figure 1 -- Demographic data

Figure 1 — Demographic data

Survey responses in text format

Here are several columns of survey responses in text format.  Column F contains data for a Yes / No / Don’t know question.  Column G contains responses for a question about salary.  Columns H through P are responses for check-all-that apply questions and columns Q and R contain Likert scale responses.

Figure 2 -- Survey responses in text format

Figure 2 — Survey responses in text format

Survey responses in numeric format

Here are the same responses but in numeric format.

Figure 3 -- Survey responses in numeric format

Figure 3 — Survey responses in numeric format

I’ll explain why it’s so useful to have the survey responses in both text and numeric format in a bit.

Meta Data (the “helper” file)

Here’s some data that I usually prepare by hand as most survey tools won’t produce it for me automatically.  Having this helps me understand the data and will  greatly streamline my work in Tableau.

Figure 4 -- Survey data meta data. This doesn’t take long to create and will be a huge time saver once we get the data into Tableau.

Figure 4 — Survey data meta data. This doesn’t take long to create and will be a huge time saver once we get the data into Tableau.

What does “just so” look like?

Our goal is to combine and reshape the various elements so that they look like this.

Figure 5 -- Reshaped data joined with meta data. Survey data in this format is very easy to use with Tableau.

Figure 5 — Reshaped data joined with meta data. Survey data in this format is very easy to use with Tableau.

As I’ve written previously, the key thing is that I no longer have a separate column for each survey response.  Indeed, I’ve reduced the number of columns from 45 to just 11, but I’ve also increased the number of rows from 845 to over 25,000. That is a good thing.

Why this works so well with Tableau

Our goal is to see how to get Alteryx to get the data in this format, not to actually use the data, but if you need convincing on why the meta data is so helpful, consider the following example.

Let’s say that in your survey you ask people to indicate the importance and satisfaction about certain services, as shown here.

Figure 6 -- Question comparing importance with satisfaction

Figure 6 — Question comparing importance with satisfaction

With the data set up “just so” conducting this comparison in Tableau becomes easy.  First we can drag Question Grouping into Filters and indicate that we just want to look at Importance and Satisfaction questions.

Figure 7 -- Using the Question Grouping field to just focus on Importance and Satisfaction questions

Figure 7 — Using the Question Grouping field to just focus on Importance and Satisfaction questions

Then we can drag Wording and Question Grouping onto the Rows shelf which gives us the framework for comparing importance and satisfaction across ten different questions.  No more having to “look up” which questions we want to explore and no more having to alias question IDs.  I love this!

Figure 8 – The helper file meta data provides the framework for comparing questions and building visualizations.

Figure 8 – The helper file meta data provides the framework for comparing questions and building visualizations.

Why do we need both text and numeric results?

We don’t really need them, but I know I certainly want them.

Consider all of the Likert scale question results.  The universe of possible values are

1
2
3
4
5

Suppose we want to know just what each of the values (1, 2, 3, 4 and 5) stand for?  The problem is that it depends on the question being asked as sometimes a 5 means “Strongly agree”, for other questions it  means “Critical” and for others it means “Extremely satisfied”.

Without having both numeric and text results we will have to write A LOT of IF / CASE statements and I, for one, do not want to do that.

So, now that we understand how and why we want the data “just so” we’ll see how to get it that way using Alteryx, Tableau 10.x, the Tableau-add-in for Excel, and Tableau 9.x.

 Posted by on October 13, 2015 2) Visualizing Survey Data, Blog Tagged with: , , , ,  5 Responses »
May 112015
 

Much thanks to Susan Ferrari for exposing me to the concept of Net Promoter Score, Susan Baier for encouraging me to blog about it, and Helen Lindsey for providing anonymized NPS data.

Overview

My wife and I recently went out to a restaurant to celebrate our anniversary.  Accompanying the check was a survey card with three questions, one of which looked like this.

Figure 1 -- The classic Net Promoter Score question

Figure 1 — The classic Net Promoter Score question

We both agreed that the restaurant was very good, if not excellent, and that we would indeed recommend it to friends.  My wife suggested we circle the “8”.

I told her that if we were enthusiastic about recommending the restaurant we should give it a “9” as a 7 or 8 would be tabulated as a “neutral” or “passive” response.

She looked at me quizzically and asked why an “8” would be considered neutral.

I then explained how the Net Promoter Score works.

Understanding the Score

Respondents are presented with the question “Using a scale from 0 to 10, would you recommend this product / service to a friend or colleague?”

  • Anyone that responds with a 0 through 6 is considered a Detractor.
  • Anyone that responds with a 7 or 8 is considered a Passive (or Neutral).
  • Anyone that responds with a 9 or 10 is considered a Promoter.

The Net Promoter Score (NPS) is computed by taking the percentage of people that are Promoters, subtracting the percentage of people that at Detractors, and multiplying that number by 100.

How to compute NPS, courtesy B2B International.

Figure 2 — How to compute NPS, courtesy B2B International.

If you are like me (and my wife) you’re probably thinking that a “6” is a pretty good score and that it shouldn’t be bunched among the detractors.

I’m not going to get into a debate about NPS methodology and its usefulness, but I do want to show you some good ways to visualize NPS data.

The Problem with the Traditional Presentation

Consider this snippet of NPS survey data with responses about different companies from people in different roles.

Figure 3 -- Raw NPS data about different companies from people with different occupations.

Figure 3 — Raw NPS data about different companies from people with different occupations.

If we just focus on the NPS and not the components that comprise the NPS we can produce an easy-to-sort bar chart like the one shown here.

Figure 4 -- Traditional way to show NPS

Figure 4 — Traditional way to show NPS

Yes, it’s easy to see the company D has a much higher NPS than company H, but by not showing the individual components – and in particular the Neutrals / Passives –  we’re missing an important part of the story as the Neutrals / Passives are right on the cusp of becoming promoters.

For example, a Net Promoter Score of 40 can come from

  • 70% Promoters and 30% Detractors
  • 45% Promoters, 50% Passives, 5% Detractors

Same score, big difference in makeup.

An Alternative Approach to Displaying NPS Results

Consider the dashboard below which presents the data as a divergent stacked bar chart.

Figure 5 -- NPS dashboard with toggle to show percentages and score.

Figure 5 — NPS dashboard with toggle to show percentages and score.

The chart is easy to sort and you can also see that Company B and Company F have a relatively large group of Neutrals.

That said, being able to see the NPS score is very useful so the dashboard (see working version at the end of this post) has a toggle that switches between percentages and the score, as shown below.

Figure 6 -- Divergent stacked bar chart with NPS overlay.

Figure 6 — Divergent stacked bar chart with NPS overlay.

Note that the NPS divergent stacked bar chart is just a variation on a Likert scale divergent stacked bar chart.  You can find an explanation of how to build this type of visualization here.

What’s Next?

We now have what I think is a more insightful way to visualize Net Promoter Score data.

But clients and readers of my blog have asked me to address some of these questions as well:

  • How do you show the difference in NPS, or just the difference in percentage of promoters, between this quarter and the previous quarter?
  • If there is a difference, is the difference statistically significant?
  • What’s a good way to visualize and analyze NPS over time?

I will be addressing these issues in an upcoming post.  Stay tuned.