swexler

Sep 012015
 

Overview

I’ll admit that I have a problem with treemaps in Tableau, but it’s not because the chart type is in some way inferior. My problem is with how people use – and misuse – treemaps.

Here’s a good example of misuse.  Instead of displaying something straightforward that looks like this…

Figure 1 -- The humble, but accurate bar chart

Figure 1 — The humble, but accurate bar chart

… some people feel compelled to add “visual variety” to their dashboards and instead create something that looks like this.

Figure 2 -- Look , Ma! I made a Mondrian!

Figure 2 — Look , Ma! I made a Mondrian!

Except for the “it looks cool” factor there’s no good reason to use a treemap in this situation.

So, when should you use a treemap?

What’s in a treemap and why it can be useful

With a treemap you have two attributes at your disposal:

  1. The size (area) of rectangles, and
  2. The color of the rectangles

A treemap consists of packed rectangles where the area of a rectangle corresponds to the size of a particular measure.  In the example above the size of the rectangle is based on the number of people that come from a particular region.  North America has the largest value so it’s represented with the largest rectangle. Europe has a smaller value to its rectangle is proportionally smaller.

Treemaps really come in handy is when you have A LOT of marks to plot and you need to show all of the marks in a compact area.

So, this sounds like a great chart – we’ve got rectangles to show how big and small stuff is, color to group related rectangles intelligently, and we can fit a lot of stuff in small space.  Why not use this chart all the time?

The downside is that we are comparing the area of rectangles and with rectangles it is difficult to make an accurate comparison. People may be very good at comparing the length of bars but as a species we are not particularly good at comparing the area of rectangles (and we’re downright awful at comparing the area of circles.)

So, given the advantages and shortcomings, just when should you use it?  Let’s look at a particular scenario.

Showing Presidential Electoral Results

A Filled Map

Consider the electoral map below showing electoral votes by state for Barack Obama and Mitt Romney in 2012.

Electoral Map Filled

Figure 3 — Filled map showing electoral votes for the 2012 presidential election (displaying 48 out of 50 states)

Our Electoral College system is fairly confusing and I can only imagine how somebody from outside the US would look at this as there appears to be more red on the map than blue… but the blue guy won!

This discrepancy becomes even more pronounced when we include Alaska and Hawaii in the map.

Figure 4 -- Filled map showing electoral vote winners for the 2012 presidential election (displaying 50 states)

Figure 4 — Filled map showing electoral vote winners for the 2012 presidential election (displaying 50 states)

Clearly, a map designed to show how much area there is in a state fails with Electoral College results where the numbers are based on population not land mass.  In the example above there’s A LOT more red then blue, but again, the blue guy won the election.

Perhaps a different type of chart will do a better job?

Symbol Map

Here’s a symbol map of the same data.

Figure 5 -- Symbol map showing electoral vote winners for the 2012 presidential election (displaying 48 out of 50 states)

Figure 5 — Symbol map showing electoral vote winners for the 2012 presidential election (displaying 48 out of 50 states)

I think this is more accurate as there’s clearly more blue than red, but it’s still a tough read.  What else might work?

Cartogram

Here’s a cartogram from Professor Mark Newman of the University of Michigan showing the same data, except the polygons for each state has been adjusted to reflect the population of the state.

Figure 6 -- Cartogram showing election results where the shape of the state is based on its population and not land mass.

Figure 6 — Cartogram showing election results where the shape of the state is based on its population and not land mass.

While it’s very clear that there is more blue than red on this map there are two problems with this approach:

  1. There aren’t many tools that will support this type of distortion; and,
  2. This map will frighten small children.

Summary Bar Chart

Why not just display a simple bar chart showing the total number of electoral votes, like the one shown here?

Figure 7 -- Electoral vote count by candidate

Figure 7 — Electoral vote count by candidate

This is certainly very clear and we can see easily by how much Obama won, but we’re missing an important part of the story.

In US presidential elections a winner is chosen by tallying the electoral votes from each state and the summary bar chart doesn’t show us how each state contributes to the total for each candidate.

And the Winner is… ? The Treemap!

Here’s a treemap showing the exact same data.

Figure 8 -- Treemap showing 2012 electoral vote results

Figure 8 — Treemap showing 2012 electoral vote results

Of all the single visualizations I think this treemap tells the most complete story.  We can see just how much states like California, Texas, Florida, and New York contribute to the total as well as gauge —  to some degree  — just how many more electoral votes Obama received than did Romney.

One shortcoming, however, is that we can’t see the names of all the states as some of the rectangles are too small.

One way to address this is by adding a tool tip, as shown here.

Figure 9 -- Hovering over a mark allows me to see the name of the state and number of electoral votes.

Figure 9 — Hovering over a mark allows me to see the name of the state and number of electoral votes.

While this works, a problem we should address is that the small states are not easily searchable.  That is, if I want to know the results for Alaska, Hawaii, Delaware, etc., I have to go hunting for them.

At this point we’ve gotten about as far as we can get with a single chart.  To tell the complete story – and to make it easy for people to find results for a particular state – we should create a dashboard.

The Electoral Vote Dashboard

Here’s a dashboard that puts two of the views together and that allows the user to find a particular state’s rectangle by selecting the state from a list.

Figure 10 -- Electoral votes dashboard.  Selecting a state from the list will display that state’s rectangle in the treemap.

Figure 10 — Electoral votes dashboard.  Selecting a state from the list will display that state’s rectangle in the treemap.

While the “star” of the dashboard is the treemap, the summary bar chart and the selectable list make the story complete and we get a solid understanding of the 2012 Electoral College results.

And we achieved this without using an actual map.

Click here to interact with dashboard.

Aug 112015
 

Overview

So, here’s why until recently I’ve recommended that my clients avoid large dashboards.

We’ve been working on a collection of killer dashboards and we’re all set to make a big presentation to the CEO. This thing is so high profile we get to use the executive conference room with the super bright projector and the 120-inch screen.

Our dashboards are all 1,325 x 1,000 pixels, but they’re going to look fantastic on that giant screen.

We’re incredibly well prepared.

At least we think we’re incredibly well prepared because when we arrive an hour early we discover the top resolution of that ever-so-fancy projector is 1,280 x 800 and our ever-so-well-crafted dashboards won’t fit on the screen.

Tableau Desktop and Reader will not scale the dashboard intelligently.

It doesn’t fit! Tableau Desktop and Reader will not scale the dashboard intelligently and we end up with the dreaded scroll bars.

Yikes, we have scroll bars! What are we going to do?

And don’t suggest using Tableau’s “Automatic” dashboard setting as it will just squish the different visualizations and won’t scale the fonts.

Let your browser scale the dashboard

While Tableau Desktop and Reader cannot scale your dashboard, Tableau Public, Tableau Online, and Tableau Server — with the help of your browser — can scale the dashboard, and scale it intelligently.

For example, using Tableau Public with the  “Zoom” feature in Google Chrome…

Using Google Chrome's "Zoom" Setting

Using Google Chrome’s “Zoom” Setting

… allows us to “fit” the dashboard on our large, but relatively low-resolution, screen.

It fits!  Thank you, browser.

It fits! Thank you, browser.

Conclusion

If you are presenting your work using Tableau Desktop or Tableau Reader then you either have to compose for the lowest-common-denominator screen or live with scroll bars.

If, however, Tableau Public, Tableau Online, or Tableau Server are an option, you should be able to use your browser’s zoom feature to make sure your dashboards fit on the screen.

Jun 042015
 

Showing Differences between Periods and Statistical Significance in Tableau

Overview

Addressing this scenario has been the most popular request I’ve received over the past year. Here’s a summary what my clients and students have asked:

  • How do I show the change in Sales, Percentage of Promoters, Number of Visits, etc., between this month / quarter / year, and the previous month / quarter / year?
  • How do I make it easy to see which areas of the organization had an increase this period and which had a decrease?
  • How do I make it easy to see how much greater / less this period’s numbers are than the previous period?
  • How do I determine and show if this change is statistically significant? That is, how do I apply the stat test we like to use in our organization?
  • If the change is statistically significant, is it a one-time thing or should I start hyperventilating?

This is a LOT to take on and we won’t be able to fit all of it into a single visualization.

But we can fit it into a compact dashboard.

Important Ground Rules

In the example that follows I look at the percentage of people that responded with a “9” or “10” to a survey question. That is, I am only looking at the percentage of people that selected one of the top two boxes.  I am NOT trying to see if there is statistical significance or calculate the margin of error in the change in Net Promoter Score over time.

The concepts I explore are not just for survey data; I just happen to have some good longitudinal survey data that is well-suited for seeing how to build a stat test formula in Tableau.

I hope you will indulge me and accept that “the company stat guru” has a fine reason for applying a particular statistical test to the data we’ll be analyzing. That said, you should push back on “business-as-usual” assumptions to determine if what you are visualizing and testing really is important (this is the focus of the work Stacy Barr is doing with her Measure Up blog and is the foundation for Stephen Few’s most recent book Signal.)

So, with the assumption that the particular stat test we want to apply – or any stat test, for that matter – is warranted, how do you show it and how do you build it?

Let’s first explore the working dashboard then see how to build it with Tableau.

Note: A very heartfelt thanks to Kelly Martin,, Joe Mako, Vicki Reinhard, Susan Ferarri, and Tiffany Spaulding who helped vet the dashboard.  I went through many different approaches before settling on the one shown below.

A very special thanks to Jeffrey Shaffer who reviewed the blog post and asked some very good questions, and also to Helen Lindsay who provided sample data.

The data and what we want to show

The data below contains the first few rows of Net Promoter Score survey data with fields for date and role.

Figure 1 -- Net Promoter Score survey data with dates and roles

Figure 1 — Net Promoter Score survey data with dates and roles

For the dashboard I built I only focused on the percentage of people that were Promoters; that is, people who responded with a 9 or 10 when asked if they would recommend a product or service.

I decided to look at the data broken down by quarters as this particular data set didn’t lend itself to month over month comparison.  Note that the techniques we’ll see will work for any time period.

Here’s the top portion of the interactive dashboard.

1_SSDashboardTop

Figure 2 — Top portion of dashboard.  Notice that you can change the selected period, the confidence percentage, and filter by company.

Understanding the chart

Figure 3 -- The key features of the chart

Figure 3 — The key features of the chart

Let’s review what we can glean from the chart.  We can see

  • The percentage of promoters for a particular period and sort them by role, using a bar chart.
  • Which roles have a percentage of promoters that is greater than the previous period and which have less, using color to distinguish (blue for greater, brown for less).
  • Just how much more or less the percentage for this period is compared to the previous using a reference line (the bar is the current period; the vertical line is the previous period).
  • Which roles showed a significantly significant increase or decrease (the red dot).

Note that that the chart uses “Cotgreavian” tooltips that allow you to glean more detail for a particular role when you hover over a bar:

Figure 4 -- Hover over a bar for in-depth information about the role for the current period and the previous period

Figure 4 — Hover over a bar for in-depth information about the role for the current period and the previous period

So, we can see from the red dot that something is up with Lawyers, Doctors and Nurses; that is, the percent increase from the previous period for Doctors and Lawyers is statistically significant and the percent decrease for Nurses is also significantly significant.  Is this a one-time thing or a trend?

Looking at changes over time

Clicking a role or roles will display trends for that role / roles.  For example, if we select Nurse in the top chart a second chart showing percentage of promoters over time will appear, as shown here.

Figure 5 -- Percentage of nurses that are promoters, over time.

Figure 5 — Percentage of nurses that are promoters, over time.

The big takeaway for me is that up until the first quarter of 2013 there were very few responses and after that there was both a consistent number of responses along with a consist decline in the percentage of nurses that were promoters.

Should you be hyperventilating because of the four-month downward trend?  That discussion is beyond this blog post but I again encourage you to check out the work Stacy Barr is doing at her Measure Up blog as well as Stephen Few’s most recent book Signal.

How the This Quarter vs. That Quarter Chart is Built

Let’s dig into how to build this in Tableau, starting with the top viz in the dashboard.

Figure 6 -- What's under the hood.

Figure 6 — What’s under the hood.

  1. Promoters – Current Quarter. This is the measure that drives the bars.  It’s also driving what appears on the labels.
  2. Promoters – Previous Quarter. This measure is on the Level of Detail and drives the reference lines.
  3. Greater / Less. This is a discrete measure that determines the color of the bar.

Promoters – Current Quarter

What we want is the percentage of people that were promoters for the selected quarter, the “selected” quarter being determined by a parameter that the user can control.

Specially, we want to add up everybody that responded with a 9 or 10 for the selected quarter and divide by the total number of people that responded.  Here’s the calculation that handles this.

SUM(

IF [Value]>=9 and DATETRUNC(‘quarter’, [Select Period])==DATETRUNC(‘quarter’,[Date])
then 1 else 0
END)

/

SUM(

IF DATETRUNC(‘quarter’, [Select Period])==DATETRUNC(‘quarter’,[Date])
then 1 else 0
END)

The translation into English is

Take the sum of

If the value from a respondent is greater than or equal to 9 and the date value, truncated to the nearest quarter from the parameter drop down [Select Period] is the same as the date value, truncated to the nearest quarter for [Date], then 1, else 0.

Divide this by the sum of

If the date value, truncated to the nearest quarter for the selected period is the same as the date value, truncated for the nearest quarter for [Date], then 1, else 0.

Not sure about the [DATETRUNC] function vs. the [DATEPART] function?  Have a look at Joshua Milligan’s excellent post explaining date values vs. date parts.

Promoters – Previous Quarter

This calculation is very similar to the calculation for the Current Quarter, except we want to find results for the quarter that occurred just prior to the selected quarter.  Here’s the calculation.

SUM(

IF [Value]>=9 and DATETRUNC(‘quarter’, [Select Period])=DATETRUNC(‘quarter’,DATEADD(‘quarter’,1,[Date]))
then 1 else 0
END)

/

SUM(

IF DATETRUNC(‘quarter’, [Select Period])==DATETRUNC(‘quarter’,DATEADD(‘quarter’,1,[Date]))
then 1 else 0
END)

The formula is the same except we use the DATEADD function to add an additional quarter; that is, we’re saying that we only want to find results where, when we add an additional quarter, we get a value equal to the current quarter; i.e., the previous quarter, plus one quarter, gives us the current quarter.

Greater / Less

The color of the bars is determined by this discrete measure:

IF [Promoters — Current Quarter] > [Promoters — Previous Quarter] then “Greater than previous”
else “Less than previous”
END

Yes, I suppose we should have a contingency for when the percentage of promoters for the current period is the same as the previous period; I leave it as an exercise for the reader to add this functionality.

So, we’ve explained everything except … The Red Dot.

The Red Dot – Computing Statistical Significance on the Fly

Most of my clients and students are surprised to find out that you can fashion a test for statistical significance inside Tableau and it can test for statistical significance “on the fly”; e.g., you can apply filters and Tableau will recalculate based on the filter settings.

The first step is determining just how the client wants to test for statistical significance. This usually entails sending an inquiry to “the stats person” who responds with something that looks like this:

Figure 7 -- Z-test formula for statistical significance

Figure 7 — Z-test formula for statistical significance

I hope your eyes aren’t glassing over as this really isn’t very complicated; it just might look complicated if you’re not used to seeing stat formulas with square root symbols.  Here are the critical things you need to know:

p1            Percentage of promoters for the current period

p2            Percentage of promoters for the previous period

n1            Number of respondents for the current period

n2            Number of respondents for the previous period

If z1 is greater than or equal to 1.96 then there is a 95% degree of confidence that the difference between the two periods is statistically significant.

So, how do we build this formula?

Slowly, and in easy-to-digest pieces.

The Dot Itself

Figure 8 -- The discrete measure Z-Test Significance Dot is responsible for displaying the dot

Figure 8 — The discrete measure Z-Test Significance Dot is responsible for displaying the dot

The calculation that produces the dot is called Z-Test Significance Dot and it is defined as follows.

IF ABS([Promoters — Z-Score Quarter])>=[Confidence] THEN “•”
ELSE “”
END

This translates as

If the absolute value of [Promoters – Z-Score Quarter] is greater than or equal to the confidence parameter (currently set to 1.96, or 95%) then display a dot; otherwise, display a null string.

And just how is [Promoters – Z-Score Quarter] defined?  Let’s explore the next layer of the onion.

Promoters – Z-Score Quarter

This is defined as follows:

[Promoters — Z-Score Quarter Numerator] /

SQRT(

([Promoters — Z-Score Quarter Denom – Current] +
[Promoters — Z-Score Quarter Denom – Previous])
)

Here’s how it maps to the stat formula we saw earlier:

Figure 9 -- Mapping the components of the formula to different calculated field

Figure 9 — Mapping the components of the formula to different calculated field

So now we just need to understand the three different pieces that go into the stat function.

Promoters – Z-Score Quarter Numerator

This is very simple and refers to calculations we’ve already used.

[Promoters — Current Quarter] –
[Promoters — Previous Quarter]

Promoters — Z-Score Quarter Denom – Current

This is fairly straightforward given what we’ve already explored.

([Promoters — Current Quarter]*(1-[Promoters — Current Quarter]))
/SUM([Promoters — Current Quarter Count])

Where [Promoters – Current Quarter Count] is defined as follows.

IF DATETRUNC(‘quarter’, [Select Period])==DATETRUNC(‘quarter’,[Date])
THEN 1 END

So SUM(Promoters — Current Quarter Count]) is just adding up all the people that responded during the selected quarter.

Promoters — Z-Score Quarter Denom – Previous

([Promoters — Previous Quarter]*(1-[Promoters — Previous Quarter]))/
SUM([Promoters — Previous Quarter Count])

This uses the same logic as [Promoters – Z-Score Quarter Denom – Current] but instead aggregates results from the previous quarter.

Putting it all together

In addition to building the components in a piecemeal fashion I will often build a crosstab of all these components to see if they are working as I would expect.  Consider the crosstab shown here.

Figure 10 -- Crosstab showing all the pieces that contribute to the red dot

Figure 10 — Crosstab showing all the pieces that contribute to the red dot

The cross tab allows us to examine all the intermediate calculations to see how the contribute to the determining calculation in the last column.

What about the secondary chart?

So we’ve now seen how to build the top chart that shows current and previous quarters broken down by role.  How does the secondary chart – the chart that appears when you click a role or roles in the first chart – work?

Figure 11 -- Percentage of promoters for Nurses over time

Figure 11 — Percentage of promoters for Nurses over time

Here we have a dual axis chart so that we can have both a line (gray) and a circle (colored based on whether the change for the previous period is statistically significant).

In this case we have to construct all of the pieces using a table calculation, but the process of putting together the different components is identical to what we saw earlier.  For example, the calculation that determined the color of the circle, [LONG_Z-Test Significance], is defined as follows.

IF ABS([LONG_Z-Score])>=[Confidence] then “Significant”
else “Not significant”
end

And [LONG_Z-Score] is defined this way:

[LONG_Z-ScoreNumerator] /

SQRT(

([LONG_Z-Score Denom Current] +
[LONG_Z-Score Denom Previous])

)

I also built a crosstab to see how all the pieces fit together, as shown below.

Figure 12 -- Crosstab to help put together a z-test calculation for values shown over time

Figure 12 — Crosstab to help put together a z-test calculation for values shown over time

Conclusion

The dashboard in this blog post shows the percentage of promoters, sorted by role, for a particular quarter, compared with the percentage of promoters for the previous quarter.  Roles where the percentage difference is statistically significant are marked with a red dot. You can drill down on a particular role (or role) and see how scores have changed over time.

While the critical visual component was showing bars and reference lines, most of the “heavy lifting” went into determining if a change was statistically significant.  The key here was to not be intimidated by a statistical formula and to build the calculations in small pieces, using crosstabs to check the work.

 

May 112015
 

Much thanks to Susan Ferrari for exposing me to the concept of Net Promoter Score, Susan Baier for encouraging me to blog about it, and Helen Lindsey for providing anonymized NPS data.

Overview

My wife and I recently went out to a restaurant to celebrate our anniversary.  Accompanying the check was a survey card with three questions, one of which looked like this.

Figure 1 -- The classic Net Promoter Score question

Figure 1 — The classic Net Promoter Score question

We both agreed that the restaurant was very good, if not excellent, and that we would indeed recommend it to friends.  My wife suggested we circle the “8”.

I told her that if we were enthusiastic about recommending the restaurant we should give it a “9” as a 7 or 8 would be tabulated as a “neutral” or “passive” response.

She looked at me quizzically and asked why an “8” would be considered neutral.

I then explained how the Net Promoter Score works.

Understanding the Score

Respondents are presented with the question “Using a scale from 0 to 10, would you recommend this product / service to a friend or colleague?”

  • Anyone that responds with a 0 through 6 is considered a Detractor.
  • Anyone that responds with a 7 or 8 is considered a Passive (or Neutral).
  • Anyone that responds with a 9 or 10 is considered a Promoter.

The Net Promoter Score (NPS) is computed by taking the percentage of people that are Promoters, subtracting the percentage of people that at Detractors, and multiplying that number by 100.

How to compute NPS, courtesy B2B International.

Figure 2 — How to compute NPS, courtesy B2B International.

If you are like me (and my wife) you’re probably thinking that a “6” is a pretty good score and that it shouldn’t be bunched among the detractors.

I’m not going to get into a debate about NPS methodology and its usefulness, but I do want to show you some good ways to visualize NPS data.

The Problem with the Traditional Presentation

Consider this snippet of NPS survey data with responses about different companies from people in different roles.

Figure 3 -- Raw NPS data about different companies from people with different occupations.

Figure 3 — Raw NPS data about different companies from people with different occupations.

If we just focus on the NPS and not the components that comprise the NPS we can produce an easy-to-sort bar chart like the one shown here.

Figure 4 -- Traditional way to show NPS

Figure 4 — Traditional way to show NPS

Yes, it’s easy to see the company D has a much higher NPS than company H, but by not showing the individual components – and in particular the Neutrals / Passives –  we’re missing an important part of the story as the Neutrals / Passives are right on the cusp of becoming promoters.

For example, a Net Promoter Score of 40 can come from

  • 70% Promoters and 30% Detractors
  • 45% Promoters, 50% Passives, 5% Detractors

Same score, big difference in makeup.

An Alternative Approach to Displaying NPS Results

Consider the dashboard below which presents the data as a divergent stacked bar chart.

Figure 5 -- NPS dashboard with toggle to show percentages and score.

Figure 5 — NPS dashboard with toggle to show percentages and score.

The chart is easy to sort and you can also see that Company B and Company F have a relatively large group of Neutrals.

That said, being able to see the NPS score is very useful so the dashboard (see working version at the end of this post) has a toggle that switches between percentages and the score, as shown below.

Figure 6 -- Divergent stacked bar chart with NPS overlay.

Figure 6 — Divergent stacked bar chart with NPS overlay.

Note that the NPS divergent stacked bar chart is just a variation on a Likert scale divergent stacked bar chart.  You can find an explanation of how to build this type of visualization here.

What’s Next?

We now have what I think is a more insightful way to visualize Net Promoter Score data.

But clients and readers of my blog have asked me to address some of these questions as well:

  • How do you show the difference in NPS, or just the difference in percentage of promoters, between this quarter and the previous quarter?
  • If there is a difference, is the difference statistically significant?
  • What’s a good way to visualize and analyze NPS over time?

I will be addressing these issues in an upcoming post.  Stay tuned.

Mar 302015
 

Overview

Tableau 9.0 includes a built-in data prepping tool that makes reshaping survey data so it plays nicely with Tableau a much smoother experience than using the Tableau Excel add-in.  While this new feature won’t replace by trusty copy of Alteryx (for reasons that I explain later in this post) there are many occasions where Tableau’s new pivot feature will be more than adequate.

In this post I will walk through using the new pivot feature along with “temporary” blending to create a solid framework for using survey data with Tableau.

Special thanks to Susan Baier for bringing this to my attention and Jonathan Drummey for showing me Tableau’s Create Primary Group feature.

So, what do we have here?

Note: if you want to follow along you can download the Excel file here.

Consider an Excel workbook that contains two sheets.  The first sheet has the survey results, a snippet of which is shown here.

Figure 1 -- Some raw survey data

Figure 1 — Some raw survey data

Notice the format: one row for each survey respondent and a separate column for each question in the survey where each question is identified with a Question ID (e.g.,  Q0, Q1, Q2, Q134a, etc.).

Column A contains a unique ID for each survey taker, Columns B through D contain demographic information, and Column E contains a weight for each survey respondent.

The second sheet maps each Question ID to a human-readable version of the question and groups related questions into logical buckets.

Figure 2 -- Helper file that maps each Question ID to the wording of the question from the survey

Figure 2 — Helper file that maps each Question ID to the wording of the question from the survey

Note that when I first blogged about survey stuff I didn’t use a helper file but now I won’t take on a project without creating one as I don’t want to spend time aliasing hundreds of question IDs.  The Grouping column also makes is much easier to select related questions and visualize them together.

The data wants to be tall and thin

Anyone who has read up on the subject know that life with survey data and Tableau is a lot easier when the data is reshaped so let’s see how to do this with Tableau 9.0.

  1. In Tableau, connect to the data source and the sheet that contains the data you want to reshape and visualize. This is what it looks like on my screen.

    Figure 3 -- Survey data prior to pivoting (reshaping)

    Figure 3 — Survey data prior to pivoting (reshaping)

  2. Select the fields you want to merge / pivot / reshape, in this case everything except the Resp ID, demographic fields, and Weight field.
  3. Click in any of the highlighted fields and select Pivot. Tableau will combine the 20+ fields into two fields, as shown here.

    Figure 4 -- Data after it has been pivoted

    Figure 4 — Data after it has been pivoted

  4. Rename the first field Question ID and the second field Value.

    Figure 5 – Pivoted fields renamed

    Figure 5 – Pivoted fields renamed

  5. Indicate whether you want an extract (a good idea when Excel is the data source) and go to the Tableau worksheet.
  6. Drag Question ID onto the rows shelf. Your screen should look like this.

    Figure 6 -- Reshaped data in Tableau.  Instead of 20 measures for each question we have only one measure.

    Figure 6 — Reshaped data in Tableau. Instead of 20 measures for each question we have only one measure.

Creating the temporary blend

Now we need to connect and relate the Helper file to our pivoted survey data.  We will do this with a blend, but then use a very slick feature of data blending that will allow us to ditch the secondary data source. Here are the steps.

  1. Click the Add New Data source tool.8_dataSource
  2. Connect to the Helper File sheet from the same Excel workbook and indicate whether or not you want to create an Extract (of course you do!)

    Figure 7 -- The secondary data source

    Figure 7 — The secondary data source

  3. Return to the Tableau worksheet.
  4. Drag the Grouping field to the left of Question ID on the rows shelf, and Wording to the right, as shown below.  Note that you don’t *have* to do this but it’s always useful to see if the hierarchy is working correctly.

    Figure 8 -- Blended Data

    Figure 8 — Blended Data

  5. Right-click the Grouping pill on the Rows shelf and select Create Primary Group.
  6. Rename the group Grouping as shown below.

    Figure 9 -- Leveraging the blend to create an ad-hoc group based on Question ID fields.

    Figure 9 — Leveraging the blend to create an ad-hoc group based on Question ID fields.

  7. Click OK.
  8. Right-click the Wording pill on the Rows shelf and select Create Primary Group.
  9. Rename the group Wording and click OK.
  10. Click the primary data source (the one from which we initially selected Question ID). Notice the two groups that Tableau generated for us.

    Figure 10 -- Tableau-generated groups

    Figure 10 — Tableau-generated groups

At this point we no longer need the secondary data source as the primary source now has groups that map and alias the Question IDs.  Very slick.

Seeing this in action

Now that we have the groups it’s easy for us to do some very quick analysis.  For example, let’s suppose we want to see the average Likert scale score for the collection of Likert scale questions.

  1. Create a new worksheet.
  2. Drag Grouping into the Filters shelf and select the collection of questions you want to view, in this case Likert Set 1.

    Figure 11 -- The Grouping group makes is easy to indicate which sets of related questions you want to examine.

    Figure 11 — The Grouping group makes is easy to indicate which sets of related questions you want to examine.

  3. Drag Wording to Rows.
  4. Right-Drag Value to Columns and select AVG(Value).
  5. Sort in descending order.

Isn’t this great?  We didn’t have to go groping around for the right Question IDs and we didn’t have to alias anything.

So, are there any shortcomings?  Is this blend approach as good as being able to join the pivoted data with the helper file?

Yes, there are shortcomings

There are several things that a join will give us that we can’t get with a blend.

You cannot refer to the group in a calculated field

You can’t refer to a group in a calculated field, so something like this won’t be available:

IF [Grouping] =”Things you Measure” then [Value] END

You need to update the group members if you add new questions

Tableau’s generation of the primary group is much like populating the members of a parameter with the members of a field.  Tableau will do it when you click a button, as it were, but it won’t update the list automatically.

If you end up adding new questions to your survey or reorganizing how questions are categorized in your helper file will either need to regenerate the primary data source groups or manually edit them.

You cannot combine text results with numeric results

This is one of the major “gotchas” for me, at least for larger surveys.  With most commercial survey systems you can download the data in a label format or a numeric format.  For example, when downloaded as labels survey responses might look like this:

Strongly disagree
Disagree
Neutral
Agree
Strongly agree

When downloaded as numbers the same responses would look like this:

1
2
3
4
5

I find I like to have both label and numeric responses, so I pivot / reshape both sets of data and then join them together using Question ID and Response ID.  Using Alteryx I can perform the join but I cannot do it with Tableau 9 and pivoted data.

Conclusion

For complex surveys where I need to do a fair amount of data cleanup and need both next and numeric values I’ll continue to use Alteryx.  For shorter surveys where I don’t need to do a lot of prep work and where either labels or numeric values will suffice, Tableau 9.0’s new pivot feature suits me just fine.  It’s a great addition to a great product.

 Posted by on March 30, 2015 2) Visualizing Survey Data, Blog Tagged with: , , ,  29 Responses »
Mar 112015
 

Overview

Note: I based my Tableau Conference 2015 presentation on this blog post. You can download a PDF of the presentation and the Tableau packaged workbook.  Click here to see a video of the presentation.

Earlier this year one of my clients was updating a collection of survey data dashboards and they wanted to revisit the way they presented demographic data.  They thought that the collection of bar charts comprising the demographics dashboard was boring and wanted to replace them with something that was a bit more visually arresting.  In particular they wanted to take something that looked like this this…

Figure 1 -- a "boring" collection of bar charts.

Figure 1 — a “boring” collection of bar charts.

… and replace it with something that looks like this:

Figure 2 -- A "flashy" demographics dashboard

Figure 2 — A “flashy” demographics dashboard

When asked why they wanted something “flashier” they indicated a desire to draw the viewer into the dashboard and they thought a dashboard with more than just bar charts would do the trick.

I wondered “why stop there?”  Why not add pictures of kittens and puppies?

Figure 2a -- the Too Cute dashboard.

Figure 2a — the Too Cute dashboard.

The real issue here is that the underlying data just isn’t interesting and adding sexy visual elements will do nothing to make the data more interesting.  There’s only one way I know to make this kind of data “interesting”.

Make it personal.

Tapestry and Chad Skelton

I recently attended the 2015 Tapestry Conference where Chad Skelton of the Vancouver Sun presented a great session making the case that people are ravenous for data about themselves.

I was particularly taken with an interactive dashboard Chad created that allows Canadians to see how much older / younger they were than other Canadians.

I decided I would look at United States census data and build a similar dashboard.

US Census Data without Personalization

Here’s a histogram showing the relationship between age and US population.

Figure 3 -- A histogram showing the relationship between age and US population.

Figure 3 — A histogram showing the relationship between age and US population.

I have to admit this doesn’t do much for me although I do find the long downward slope from around the age of 50 somewhat interesting (but I am a bit of a data geek).

Contrast this general purpose graphic with the personalized dashboard shown below.

Did you try it?  Are you over 38 years old?  If “yes,” were you a bit depressed?

I certainly was.

While I don’t mean to depress anyone I do want to underscore how much more interesting the data is when the data is about YOU.

Make the Demographics Dashboard Interesting – Make it Personal

With the goal of personalization in mind let’s see how we can make the dashboard in Figure 1 more interesting.

Let’s start by gathering some information about the person viewing the dashboard; that is, let’s present some parameters from which the viewer can apply personalized settings:

Figure 4 -- Get your user to tell you something about himself / herself.

Figure 4 — Get your user to tell you something about himself / herself.

Now we can take these parameter settings and highlight them in the dashboard.

Figure 5 -- A "personalized" demographic dashboard.

Figure 5 — A “personalized” demographic dashboard.

We can then go one step further and invite the viewer to select the colored bars to see exactly how many people that took the survey have the same demographic background as the person interacting with the dashboard.

6_boring

Figure 6 — There are 65 people who fall into the same demographic pool as the person viewing the dashboard.

Conclusion

I’ve become a big advocate for adding personalization to dashboards and a number of my clients have started to adopt the approach.  I’ve seen some very good results at Bersin by Deloitte where Bersin is leveraging their proprietary survey data by allowing individual organizations to benchmark their numbers against similar organizations.

Note: A few months ago Joe Mako sent me a link to a Stephen Few blog post.  In researching this topic I revisited the post and see that Chad Skelton was in fact featured in Few’s essay . It seems that Skelton did not just “happen” upon the idea of personalization but was grappling, like so many of us, with ways to entice people to engage with visualizations.

For the record, I think personalized bar charts beat packed bubbles any day of the week.

Jan 202015
 

 “With great power comes great responsibility”

— Voltaire

— Benjamin Parker (Uncle Ben from Spiderman)

Overview

Recently both Ryan Sleeper and Andy Kriebel blogged about donut charts in Tableau.

Figure 1 -- Donut chart courtesy of Andy Kriebel

Figure 1 — Donut chart courtesy of Andy Kriebel

While both of them cautioned about where, when, and how best to use them, I fear many people will ignore the warnings and dig into this sugary, analytically-impoverished chart type and start creating stuff like this.

Figure 2 -- Really bad donut chart.  In fact, it’s just a pie chart with a whole in the middle.

Figure 2 — Really bad donut chart. In fact, it’s just a pie chart with a hole in the middle.

Yuk.

And what fuels my fear?  Ryan and Andy do great work, and they write great blogs.  They rightfully have a lot of influence in the Tableau community.

But with great power — and influence —  comes great responsibility and I suspect that some people will see Ryan and Andy’s work, ignore their recommendations, and apply the following bit of “logic”:

Ryan Sleeper is a Tableau Iron Viz champion and really cool — and he makes donut charts.

Andy Kriebel is a Tableau Zen master and really cool — and he, too, makes donut charts.

I want to make cool vizzes and be really cool; therefore, I should make donut charts.

[Insert face palm here]

Interviewer: So, what do you have against donut charts?  Don’t you think they look cool?

Me: My problem is that donut charts don’t tell you very much.

Interviewer: Yes, but they look cool!

Me [yelling]: You know what else looks look cool?  Pictures from the Hubble telescope.  Vintage electric basses.  Three-dimensional pie charts! Should I festoon my dashboards with these images, just because they look cool?

Interviewer: Fine, explain to me why this chart types doesn’t work, but I’d like to see an alternative that isn’t BOR-ING!

Me:  Okay, allow me to do the following:

  • Explain why donut charts don’t tell you much (or not as much as a bar chart)
  • Present a better alternative
  • Show how to have your cake (not your donut) and eat it, too

Why donut charts don’t tell you much

Consider the chart in Figure 1, above.

I always recommend that people ask the following questions when coming up with a visualization:

  • Do I need different colors?
  • Do I need a legend?
  • Do I need measure labels?

Let’s see what happens when we remove the measure labels:

Figure 3 -- donut chart without measure labels.

Figure 3 — donut chart without measure labels.

The chart does pass some of the “can I figure this out test”.  For example, it’s easy for me to see that West is around one quarter of the way to goal and that East is a little more than half way.  Where the chart fails is with comparison among regions.  For example, can you tell how much closer North is to its goal than West?  This comparison is particularly hard to determine as it’s very difficult to gauge how much longer one arc is than another arc.

A better alternative

I think a bar chart with a goal line is easier to grok.  It tells me more and takes up less screen real estate, too.

Figure 4 -- Bar chart with goal line.

Figure 4 — Bar chart with goal line.

There’s an added advantage in that I can easily see both the progress towards a goal and that the goal is $100,000.

Better yet, suppose the goals were different for each region?  Right now they each have a shared goal of $100,000 but suppose the goal for North is $125,000 and the goal for East is $75,000?  With the donut chart, how will you show the actual goal and the progress towards the goal at the same time?

Why is it easy to compare progress across regions using the bar chart?  I’ve discussed this in length here, but the bottom line is that humans are much better at judging the length of bars than they are judging the area of circles or the lengths of arcs.

But does the chart pass the “no measure labels” test?  Have a look.

Figure 5 -- Bar chart without measure labels.

Figure 5 — Bar chart without measure labels.

While I prefer having labels, it’s pretty easy for me to the following:

  • North is more than twice as long as West
  • East is a little more than half way
  • South is more than a third of the way to goal
  • East is about twice as long as West

In other words,  I can draw conclusions more easily from this chart than the donut chart.

Another Example

Consider the chart below that shows the percentage of confirmed judicial nominees that are women, broken down by president.

Figure 6 – Donut Chart showing Female Judicial Nominees (source: Alliance for Justice)

Figure 6 – Donut Chart showing Female Judicial Nominees (source: Alliance for Justice)

There are some good stories in here but they are buried.  Compare this with a bar chart that contrasts the different presidents and underscores the differences between Republicans and Democrats.

Figure 7 -- Bar Chart showing Female Judicial Nominees (source: Alliance for Justice)

Figure 7 — Bar Chart showing Female Judicial Nominees (source: Alliance for Justice)

I think this is a lot clearer.

But it is, well, boring.

Have your cake and eat it, too

I admit that most of my practice has me building stuff that looks more like it would appear in The Economist than in USA Today, but I do understand that you may need to create something that is eye catching.

And I agree that the donut chart is eye catching, but I hate to sacrifice information for the sake of decoration.

Is there a way to get both?

I think there is.  Let’s work on the first example where we were examining progress towards a goal broken down by region.

Want some sugar?  Try a lollipop chart

Figure 8 -- Lollipop chart

Figure 8 — Lollipop chart

Creating a lollipop chart is easy in Tableau. You create a dual axis chart where both measures are identical but you have a different chart type (in this case a bar chart combined with a circle chart).

Figure 9 – Tableau settings for a lollipop chart

Figure 9 – Tableau settings for a lollipop chart

Try some fun shapes

We can also take the lollipop chart and dress it up with a custom shape, like the one shown below.

Figure 10 -- Combination bar and shape chart

Figure 10 — Combination bar and shape chart

While I prefer the lollipops to the runner, I have no problem with the chart shown above because I don’t have to work hard to see both the distance from the goal and to compare among regions.  That is, I did not fight Tableau’s suggested default chart type but instead took it and dressed it up a bit.

Conclusion

NoDonutEven if you are tasked with having to create visualizations for mass public consumption I urge your to use caution before creating a donut chart. I understand that you may need something that is more visually arresting than a simple bar chart, but take that as a challenge: find a way to make something that looks cool but does not sacrifice one bit of analytical clarity.

And if you do create a donut chart, please look carefully at what Ryan and Andy did (and did not do) in fashioning theirs.