Nov 262017
 

November 25, 2017

Special thanks to Jeffrey Shaffer, Andy Cotgreave, and Rody Zakovich for feedback that helped improve the dashboard that appears at the end of this post.

Overview

It seems I’m not the only person who has been thinking about stacked bar charts (see posts from Cole Nussbaumer Knaflic, Jonathan Schwabish, and Andy Cotgreave.)

My problem with these charts, and their first cousin, the area chart, is that the many people who design them don’t understand the possible pitfalls and end up creating charts that are attractive but that don’t convey a lot of useful information.

In this blog post we’ll see examples of where stacked bar and area charts work, where they fail, and what you can do to add some functionality to your dashboards so that if you do use stacked bar and area charts they will work better.

The people who market data viz tools love these charts

Some of the chief culprits include the data visualization vendors themselves who sometimes fashion “screaming cat” visualizations like these in their marketing materials and promotions.

01_StackedBars_Cat

Figure 1 — Sample dashboard Tableau uses to showcase its extensions API.

Figure 2 -- Microsoft PowerBI dashboard.

Figure 2 — Microsoft PowerBI dashboard.

Figure 3 -- Area chart from Tableau's home page

Figure 3 — Area chart from Tableau’s home page

I’ll admit the last one looks particularly cool, but do you have any inkling what it’s trying to show you?

Before we get into exactly what’s wrong with the charts (and how to fix them) let’s look at a couple of examples that work very well.

Some good examples

Here’s an example from The Big Book of Dashboards’ Complaints dashboard.

Figure 4 -- A portion of the Complaints Dashboard showing open, closed, and overall complaints (Dashboard by Jeffrey Shaffer).

Figure 4 — A portion of the Complaints Dashboard showing open, closed, and overall complaints (Dashboard by Jeffrey Shaffer).

With this chart it’s very easy for me to see the total number of complaints (overall length of blue bars plus red bars) as well as compare the number of open complaints (red bars) because there are only two colors and the items I want to compare are open complaints (red) and total complaints (red plus blue), both of which have a common baseline.

Another example comes from Matt Chambers’ Mayweather vs. McGregor fight analysis dashboard.

Figure 5 -- Stacked bar chart comparing overall punches and punches that landed between Mayweather and McGregor

Figure 5 — Stacked bar chart comparing overall punches and punches that landed between Mayweather and McGregor

You should check out the complete dashboard, but this stacked bar chart gets to the heart of why Mayweather won the fight: McGregor exerted more effort in launching 430 punches vs. Mayweather’s 320, but far fewer of McGregor’s punches landed (111 to 170).

As you consider why this chart is so effective notice that we only care about two things — the punches that landed and the total number of punches.

So, why do I like these two examples, but cite the earlier dashboards as “screaming cats”? It has to do with how many segments there are, and which segment is along the baseline.

Let’s explore a bit.

Understanding the strengths and weaknesses of stacked bar charts

Consider the chart shown below.

Figure 6 -- Typical stacked bar chart. We can make accurate comparisons of overall and of the first category (Central), but nothing else.

Figure 6 — Typical stacked bar chart. We can make accurate comparisons of overall and of the first category (Central), but nothing else.

I can see that Phones has more sales overall (1), that Chairs is the biggest seller in the Central region (2) and the Bookcases is the lowest seller in the Central region (3).  If that’s all that is important then we may be all done here (although it is hard to see that Bookcases is in fact less wide than Machines… more on that in a moment.)

But suppose I want to know what were the three lowest selling categories in the Central region, or if I wanted to easily compare sales in the East or West? In these cases this visualization isn’t much help and *that’s* the biggest problem with stacked bar and area charts: You can only accurately compare overall values and the one region that hugs the baseline.

Adding functionality — sorting and focus

Let’s address the “what were the three lowest sellers in the Central region?” question first. One way to do this would be to have a widget on your dashboard that allows you to sort by both total sales and by sales for a particular region. Here’s what the sort would look like for the Central region.

Figure 7 -- Bars sorted by Central region. Now it's easy to see which where the top and bottom sellers in that region.

Figure 7 — Bars sorted by Central region. Now it’s easy to see which where the top and bottom sellers in that region.

Ah, now we can easily answer the question “what were the bottom three sellers in the Central region?” They are Accessories, Machines, and Bookcases.

This is great if all you care about is the Central region, but suppose you wanted co compare sales in the South?  With the way the chart is configured above this is very difficult, but if you add a “widget” that allows your audience to select a region to focus on, the chart can easily answer the question.

Figure 8 -- Adding some functionality to the visualization so the audience can move a selected region to the baseline and sort by that region.

Figure 8 — Adding some functionality to the visualization so the audience can move a selected region to the baseline and sort by that region.

The “Focus on” parameter allows the user to select which region gets placed along the baseline and the “Sort Bar Cart by” parameter allows the user to sort either by the Selected region or by overall.

But, if what we’re interested in is showing how one region compares with itself and overall, why bother to have the other regions as different colored bars?  That is, why not make the two things we care about — overall and the region in question — stand out more?

Highlighting the selected region

My fellow author Jeff Shaffer suggested I add this functionality to the visualization and I think it’s a terrific addition. Let’s see how much easier it is to focus on the two main questions (overall and the Selected region) when we mute the colors that aren’t stacked along the baseline.

Here’s the results when we sort by the selected region.

Figure 9 -- Stacked bar chart with muted colors sorted by Selected region.

Figure 9 — Stacked bar chart with muted colors sorted by Selected region.

And here are the results when we sort by overall sales.

Figure 10 -- Stacked bar chart with muted colors sorted by overall sales.

Figure 10 — Stacked bar chart with muted colors sorted by overall sales.

What about Area charts?

You’ll need to address the same issues with area charts as you can only make accurate comparisons for totals and for segments that hug the baseline, as shown below.

Figure 11 -- Area chart showing sales over time. Note that we can compare overall sales and sales in the West as there is common baseline.

Figure 11 — Area chart showing sales over time. Note that we can compare overall sales and sales in the West as there is common baseline.

Note that because we are not including the product sub-categories the sorting feature is not needed.

100% stacked bar charts

Whereas the regular stacked bar chart allows you to make accurate comparisons of overall sales and one region at a time, a 100% stacked bar chart will allow you to accurately compare the two outer regions, as we can see below.

Figure 12 -- 100% stacked bar chart. We can compare the outer regions (Central and West) because there is a common baseline

Figure 12 — 100% stacked bar chart. We can compare the outer regions (Central and West) because there is a common baseline

Buy we can’t accurately compare the inner regions:

Figure 13 -- 100% stacked bar chart. We cannot compare the inner regions (East and South) because the elements are floating; there isn’t a common baseline.

Figure 13 — 100% stacked bar chart. We cannot compare the inner regions (East and South) because the elements are floating; there isn’t a common baseline.

Give the dashboard a try

The dashboard below allows you to explore the functionality discussed in this post. Please note that I’m not suggesting you should include all the widgets in the dashboard. Indeed, maybe this is something you use on our own to help curate interesting findings in your data that you then highlight in a presentation or using Storypoints.

As for how to build all this functionality into Tableau, if you download and the workbook and look under the hood you’ll see there’s nothing terribly complicated going on (indeed, there isn’t one LOD calc). That said, my solution is not very robust — it’s hard-coded to only show the four known regions that are currently in the data set. I’m sure with a bit more effort one could fashion something extensible but for this blog post I wanted to prototype the functionality, not craft a robust solution.

Parting thoughts: Do make sure to check out this post where Rody Zakovich applies a different approach to looking at overall and segmented sales for individual customers.

 

Oct 252017
 

Overview

Tableau’s automatic axis feature can often present problems with survey data.

In this blog post we’ll look at how the problem crops up and what you can do to both fix the problem and make your visualizations more insightful.

The problem with auto-adjusting bars

Tableau’s automatic axis does a great job extending and contracting the length of bars so that a bar chart fills the view. In most circumstances this is a good thing but there are cases where it can inhibit your audience’s ability to correctly interpret your data.

Consider the chart below where we see the percentage of people that selected that top 2 boxes for a Likert scale question (in this case we see the percentage of people who selected Agree or Strongly Agree).  Pay attention to the length of the top bar.

Figure 1 -- Percentage of respondents that selected Agree or Strongly Agree (Top 2 Boxes).  Notice the length of the top bar.

Figure 1 — Percentage of respondents that selected Agree or Strongly Agree (Top 2 Boxes).  Notice the length of the top bar.

Now let’s see what happens when we change the view to show the percentage of people that only selected Strongly Agree (Top Box).  Again, pay attention to the length of the top bar.

Figure 2 -- Percentage of respondents that selected Strongly Agree (Top Box).  Again, notice the length of the top bar.

Figure 2 — Percentage of respondents that selected Strongly Agree (Top Box).  Again, notice the length of the top bar.

There’s a HUGE difference here, but unless you look at the numbers (and the labels, as the sorting has changed) you’re going to miss it.

Make the differences pop by comparing with 100%

Here’s the same data but this time we’re showing the gap between each bar and 100%.

First, let’s see what the Top 2 Box view looks like.

Figure 3 -- Percentage of respondents that selected Agree or Strongly Agree (Top 2 Boxes) compared with 100%.

Figure 3 — Percentage of respondents that selected Agree or Strongly Agree (Top 2 Boxes) compared with 100%.

Now let’s see what happens when we change this to show only the Top Box.

Figure 4 -- Percentage of respondents that selected Strongly Agree (Top Box) compared with 100%.

Figure 4 — Percentage of respondents that selected Strongly Agree (Top Box) compared with 100%.

As chef Emeril Lagasse might say “BAM!”

I’ll walk through how to create the 100% bars in a moment but I first want to show another instance where the auto-sizing bars can present a problem.

Showing demographics: comparing two similar charts

A similar issues comes into play with demographics dashboards.  Consider the very simple dashboard below that shows the percentage of respondents broken down by location and gender.

Figure 5 -- Respondent demographics dashboard with auto-sizing bars.

Figure 5 — Respondent demographics dashboard with auto-sizing bars.

Do you see the problem?  No?  Check out what happens when you hover over the North America bar and the Male bar.

Figure 6 -- Two independent charts with slightly different levels of magnitude.

Figure 6 — Two independent charts with slightly different levels of magnitude.

The top bar in both cases is the same length, but in the top chart is represents 341 respondents (40%) and in the bottom chart it represents 436 people (52%).

The solution is to normalize the chart so that both have a common axis, as shown below.

Figure 7 -- Demographics dashboard with common (normalized) axes.

Figure 7 — Demographics dashboard with common (normalized) axes.

We will consider a couple of ways to produce this normalized axis later but first let’s tackle the 100% reference bars.

Building the reference bars visualization

Let’s start with the view shown below which shows the percentage of folks that selected the Top N Boxes.  Note that the completed dashboards may be found embedded at the end of this blog post.

Figure 8 — Initial view

  1. Create a calculated field called 100 Percent and define it as follows.
    09_100
    Yes, you will need to define at as “1.0” so that Tableau treats it as a float and not an integer. This will come up later when we need to create a dual axis and synchronize the charts.
  2. Right-drag the newly-created field to Columns and select MAX(100 Percent) from the Drop Field dialog Box. Also make sure the mark is set to Bar and not automatic.
    10_RightDrag
  3. Right-click MAX(100 Percent) on columns and select Dual Axis. Your screen should look like the one shown below.
    11_DualEearly
  4. Right-click the axis at the top and select Move marks to back, then right-click the axis again and select Synchronize Axis. Your screen should look like this.
    12_AlmostThere
  5. Right-click the top axis and deselect Show Header, then modify the colors so that the 100% bars is a muted gray.

Normalizing the axis in the demographics dashboard

I’m going to recommend a different approach for the demographics dashboard as we really don’t need to show 100%; it’s very unlikely that any one segment will ever reach 100%.  Instead I propose adding a hidden reference line placed at the maximum value that any demographic element is likely to reach, say 60%

Consider the Location demographic visualization below.

13_Location

There are several ways to hard-code a reference line at 60%. Here’s one way to go.

  1. Right-click the axis along the bottom and select Add Reference Line.
  2. When the Add Reference Line, Band, or Box dialog box appears, select Entire Table and indicate you want a constant set for .6. Make sure to set Label to None and Line to None as shown below.
    14_RefLine
  3. Repeat this for all the other worksheets that comprise your demographics dashboard.

“Wait!”, you may exclaim. “Why not just fix the axis at 60%?”

That will work… most of the time.  But what happens if you fix the axis at 60% and you in fact have a bar that exceeds that value?  If you fix the axis you are out of luck, but a hidden reference line will not constrain the axis; that is, it will make sure the axis extends to at least 60%, but the axis will go beyond that if needed.

Conclusion

Tableau’s tendency to auto-size an axis is great but can sometimes foster confusion. In this post we explored two different techniques that will help your audience make more accurate comparisons.

Sep 202017
 

September 20, 2017

Overview

As anyone who has read anything on my blog that relates to survey data knows, the number one impediment to success with Tableau is getting your data “just so.”

Until recently I recommended two different approaches. You can either use Alteryx and have a rock solid, robust, fully-featured, (and expensive) solution or you could have a somewhat rickety, cumbersome, free solution using cross-database joins in Tableau.

Thanks to the advice of fellow Zen Master Rob Radburn, I now know about a third alternative, EasyMorph. Based on my brief time working with the product, EasyMorph appears to be solid, robust, full-featured, and costs somewhere between free and not-at-all expensive.

In this blog, I’ll show you how to take the same data set I use in all my classes and get it setup so that it works perfectly with Tableau.

Working with EasyMorph

Using the same data set I discuss in the general “here’s how your data needs to be setup” blog post, let’s see how we’re going to get the data-as-numbers, the data-as-labels, and the helper file to all play nicely in EasyMorph.

Importing and Reshaping the Label Responses

Here’s the EasyMorph Start page. Let’s begin by importing the Label responses from Excel.

Figure 1 -- EasyMorph Start page. Yes, we're importing Excel files but we can also connect to databases.

Figure 1 — EasyMorph Start page. Yes, we’re importing Excel files but we can also connect to databases.

Here I indicate that I want to open the Excel workbook and then tell EasyMorph which tab in the workbook I want to use.

Figure 2 -- Specifying which sheet in the Excel workbook I want to use, in this case Data Labels.

Figure 2 — Specifying which sheet in the Excel workbook I want to use, in this case Data Labels.

Clicking the Apply button will apply this “transformation”, where a transformation is just an action I want to have happen in my workflow. EasyMorph will then display the data in its main workspace window, as shown below.

Figure 3 -- The Excel Data Labels tab imported into EasyMorph.

Figure 3 — The Excel Data Labels tab imported into EasyMorph.

We now need to specify which columns we want to leave intact and which we want to reshape.  We can do this by adding an Unpivot transformation.

I’ll start by clicking Add New Transformation

Figure 4 -- Adding a new transformation

Figure 4 — Adding a new transformation

… then I’ll click the Advanced menu and select Unpivot from the list, as shown here.

Figure 5 -- Not the most obvious place to find this feature, but this is how you reshape the data (Tableau calls this a pivot; EasyMorph calls it an unpivot.  Tomato Tomahto...)

Figure 5 — Not the most obvious place to find this feature, but this is how you reshape the data (Tableau calls this a pivot; EasyMorph calls it an unpivot.  Tomato Tomahto…)

We next need to indicate what we want to call the reshaped fields and which fields we want to leave intact.  Here we indicate that we want to leave Gender Generation, and Location intact, but we also need to include RespID and Weight (we need to scroll down to see them.)

Figure 6 -- Specify which fields to keep and which to reshape. The two other fields are at the bottom of the list.

Figure 6 — Specify which fields to keep and which to reshape. Note that the two other fields we need to select are at the bottom of the list.

Next, we need only click the Apply button and we’ve got our demographics and Data Labels set up, “just so.”

Figure 7 -- Our reshaped data.  We've applied two transformations.  Notice at the bottom of the screen we see that we went from 45 columns and 845 rows to 7 columns and 33,000 rows.

Figure 7 — Our reshaped data.  We’ve applied two transformations.  Notice at the bottom of the screen we see that we went from 45 columns and 845 rows to 7 columns and 33,000 rows.

Importing and Reshaping the Numeric Responses

We’re now ready to import the Numeric version of the data. We start by clicking Import/create table, indicate we want to import from a file, and select the same Excel file.

Figure 8 -- Inserting a second set of data.

Figure 8 — Inserting a second set of data.

We’ll see the same options as before, but this time we’ll specify that we want to import the Data Numbers tab, as shown below.

Figure 9 -- Importing the numeric version of the data.

Figure 9 — Importing the numeric version of the data.

Clicking Apply will import the data.

Before reshaping the numeric version of the data, we can remove redundant columns, in this case the “demographic” variables (Gender, Location, Generation, and Weight) as we already have them in the Label version of the data.

We can do this by selecting the columns and selecting Remove 4 columns from the context menu, as shown here.  Note that this context menu approach is just a different way to create and apply a transformation.

Figure 10 -- We don't need to have these columns as they are already present in the other data set.

Figure 10 — We don’t need to have these columns as they are already present in the other data set.

We’ll now Unpivot the data, leaving RespID and renaming the columns, as shown below.

Figure 11 -- Reshaping the numeric version of the data.

Figure 11 — Reshaping the numeric version of the data.

Joining the Two Reshaped Data Sources

We’re now ready to join the two reshaped data sources. We’ll need to make sure that every Quesiton ID for each RespID lines up so we will join the tables on both of these fields.

Start by clicking the first table (Imported Table 1), then select Add new transformation, then select Merge another table from the list of transformations, as shown below.

Figure 12 -- This is how you do a Join in EasyMorph.

Figure 12 — This is how you do a Join in EasyMorph.

By default, EasyMorph suggests matching the data based on RespID and Question ID. Note that I only need to bring in the Values field from the second table, as shown here.

Figure 13 -- Selecting which fields to join and which columns to combine from the second data source.

Figure 13 — Selecting which fields to join and which columns to combine from the second data source.

Clicking Apply will append the Values column from the second data source to the first data source.

Adding the Helper data to the mix

We’re now ready to import and join the “Helper” data. This will map each Question ID to its human readable form and groups related questions together.

Start by clicking Import/create table, then indicate that you want to import from a file, then select the same Excel file.

We’ll see the same options as before, but this time we’ll specify that we want to import the Question Helper sheet, as shown below.

Figure 14 -- Importing the Helper data.

Figure 14 — Importing the Helper data.

We’re now ready to merge the Helper data with the combined numeric and label data.

We start first by selecting the main table (Imported Table 1) and selecting Add new transformation.

We then select Merge another table and indicate we want to add columns from Imported table 3, as shown below.

Figure 15 -- Selecting which secondary table to merge.

Figure 15 — Selecting which secondary table to merge.

We now need to specify which fields to use for the join and which columns to combine. Note that the field names are different so EasyMorph didn’t automatically join on Question ID.

Figure 16 -- Specifying the Join field and which columns to combine. Here we join “Question ID” with “QuestionID.”  Hey, who says I don’t show “real world” examples?

Figure 16 — Specifying the Join field and which columns to combine. Here we join “Question ID” with “QuestionID.”  Hey, who says I don’t show “real world” examples?

Clicking Apply will combine all the relevant data into the main table, as shown below.

Figure 17 -- All the data, combined correctly.

Figure 17 — All the data, combined correctly.

Notice the icons along the top of the window indicating that we’ve performed four transformations on this table. Selecting an icon will show the settings you specified for the selected transformation.

Not convinced the data is “just so”? We can sort on RespID and you’ll see all the survey responses for a particular respondent (Just right-click the RespID column and select Sort.)

Exporting the Data to a Tableau Data Extract file (.TDE file)

We’re now ready to export the data so that we can visualize it using Tableau.

We’ll start by making sure the main table is selected and then click Add new transformation.

We then select Export and select Export to Tableau from the list of options, as shown below.

Figure 18 -- Exporting the data to a .TDE file.

Figure 18 — Exporting the data to a .TDE file.

Clicking Export to Tableau will bring up the Export options, as shown below.

Figure 19 -- Exporting to a .TDE file. Note that you can write the file directly to Tableau Server.

Figure 19 — Exporting to a .TDE file. Note that you can write the file directly to Tableau Server.

Before clicking Apply, indicate that RespID and Labels should be Text fields and not Number fields.

Okay, we’re just about finished; we just need to run the project by clicking the Run project button at the top of the screen.

Want to load the exported .TDE into Tableau? Just click the last transformation icon at the top of the screen, then click the Open file icon, as shown below.

Figure 20 -- You can load the exported data into Tableau by clicking the Open file icon.

Figure 20 — You can load the exported data into Tableau by clicking the Open file icon.

Below we see what complete data flow looks like. Note that I added a transformation to filter out null values (It’s the second to last icon in the top table). Further note that you can rename each of the tables as you see fit.

Figure 21 -- The complete data flow.

Figure 21 — The complete data flow.

But wait, there’s more!

Assuming your data is coded correctly you’ll probably want to remove null values and labels. No problem, you can just Add a transformation and indicate you want to remove <empty> values.

And what if your data isn’t coded correctly (e.g., check all apply questions are coded using 1s and blanks instead of 1s and 0s)?  EasyMorph can handle that, too.

Conclusion

Easy morph was very easy to learn and you can’t beat the price. If you have 30 or fewer transformations in a project you can use the free download (our sample only has ten transformations). Need more than 30 transformations as well as more sophisticated features (e.g., automation and scripting) the price will range from a one-time license of $195 to $55 per month.

EasyMorph also appears to handle large survey data sources without a problem. I was able to convert a monster NPS survey with 3,500 columns and 2,200 rows into a trim ten columns and about 7M rows in just a matter of seconds.

So far, the only downside I’ve seen is that it won’t read SPSS files directly; you’ll need to export to two CSV files, one for numeric-coded data and one for label-coded data.

Other than that minor issue the tool has been terrific and I plan to incorporate it into my classes on visualizing survey data using Tableau.

If you need geospatial analysis and predictive analysis in addition to ETL then EasyMorph won’t meet your needs (this is where Alteryx shines). But if you just need solid ETL capabilities, this is a great tool, at a great price.

 Posted by on September 20, 2017 2) Visualizing Survey Data, Blog Tagged with: , ,  7 Responses »
Aug 232017
 

Why you may be missing important insights if you only look at Percent Top 2 Boxes.

August 23, 2017

Overview

Anyone who has looked to this blog for insights into visualizing survey data knows that my “go to” visualization for Likert scale sentiment data is a divergent stacked bar chart (Figure 1).

Figure 1 -- Divergent stacked bar chart for a collection of 5-point Likert scale questions

Figure 1 — Divergent stacked bar chart for a collection of 5-point Likert scale questions

You might prefer grouping all the positives and negatives together, showing only a three-point scale.  Or perhaps you question having the neutrals “straddle the fence” as it were, with half being positive and half being negative.  These are fair points that I’m happy to debate at another time as right now I want to focus on what happens when we need to compare survey results between two periods.

Showing responses for more than one period

As much as I love the divergent stacked bar chart, it can become a little difficult to parse when you show more than one period for more than one question. Consider the chart below where we compare results for 2017 vs. 2016 (Figure 2).

Figure 2 -- Showing responses for two different periods

Figure 2 — Showing responses for two different periods

As comfortable as I am with seeing how sentiment skews positive or negative with a divergent stacked bar chart, I’m at a loss to compare the results across two different years. The only thing that really stands out is that there appears to be a pretty big difference between 2017 vs. 2016 for “Really important issue 7” at the bottom of the chart.

The allure of Percent Top 2 Boxes

It’s times like these when focusing on the percentage of respondents that selected Strongly agree or Generally agree (Percent Top 2 Boxes) is very tempting.  Consider the connected dot plot in Figure 3.

Figure 3 -- Connected dot plot showing difference between Percent Top 2 Boxes in 2017 and 2016.

Figure 3 — Connected dot plot showing difference between Percent Top 2 Boxes in 2017 and 2016.

Hey, that’s clear and easy to read. Indeed, this is one of my recommended approaches for comparing Importance vs. Satisfaction and it works great for comparing results across two time periods.

So, we’re all done, right?

Not so fast. While this approach will work in many cases, you should never stop exploring as there may be something important that remains hidden when you only show Percent Top 2 Boxes.

It’s not the economy, it’s the neutrals (stupid)

I was recently working with a client who had surveyed a large group about several contentious topics. The client believed that, much like the population of the United States, the surveyed population had become more polarized over the past year, at least with respect to these survey topics.

In reviewing the results for three questions, if we just focus on the positives (Percentage Top 2 Boxes) things look like they have improved (Figure 4.)

Figure 4 -- Connected dot plot showing change in positives (Percentage Top 2 Boxes) between 2017 and 2016.

Figure 4 — Connected dot plot showing change in positives (Percentage Top 2 Boxes) between 2017 and 2016.

See? We have more positives (green) now than we did a year ago (gray.)

This may be true, but it only tells part of the story.

Consider the divergent stacked bar chart shown in Figure 5.

Figure 5 -- 5-Point Divergent stacked bar char comparing results from 2017 and 2016

Figure 5 — 5-Point Divergent stacked bar char comparing results from 2017 and 2016

Woah… there’s something very interesting going on here, but it’s very hard to see.  Maybe if we combine all the positives and negatives the “ah-ha” will be easier to decipher (Figure 6).

Figure 6 -- 3-Point Divergent stacked bar char comparing results from 2017 and 2016. There are big differences between the two time periods, but they are hard to see.

Figure 6 — 3-Point Divergent stacked bar char comparing results from 2017 and 2016. There are big differences between the two time periods, but they are hard to see.

Well, that’s a little better, but the story — and it’s a really big story — is still hidden. Let’s see what happens if we abandon both the connected dot plot and divergent stacked bar chart and instead try a slopegraph (actually, a distributed slopegraph, Figure 7).

Figure 7 -- Distributed slopegraph showing change in positives, neutrals, and negatives.

Figure 7 — Distributed slopegraph showing change in positives, neutrals, and negatives.

Now we can see it!  Just look at the gray lines showing the dramatic change in neutrals.  My client’s hunch was correct — the population has become much more polarized as the percentage of neutrals have plummeted while the percentage of people expressing both positive and negative sentiment has increased. You cannot see this at all with the connected dot plot and it’s hard to glean from the divergent stacked bar chart.

There is no, one best chart for every situation

I had the good fortune to attend one of Cole Nussbaumer Knaflic’s Storytelling with Data workshops. She uses a wonderful metaphor in describing how much work it can take to present just one, really good finding. I paraphrase:

“You have to shuck a lot of oysters to find a single pearl. In your presentations, don’t show all the shells you shucked; just show the pearl.”

For this last example, if I only had 30 seconds of the chief stakeholder’s time I would just show the distributed slopegraph as it is the “pearl.”  It clearly and concisely imparts the biggest finding for the data set: the population has become considerably more polarized for all three issues.

But…

What happens if the chief stakeholder wants to know more? I would be armed with an interactive dashboard to answer questions like these:

“The people that disagree… how many of them strongly disagree?”

“The people that agree… how many of them strongly agree?”

“Are these findings consistent across the entire organization, or only in some areas?”

Conclusion

So, when showing changes in sentiment over time, which chart is best? The connected dot plot? The divergent stacked bar chart? The distributed slopegraph?

To quote my fellow author of the Big Book of Dashboards, Andy Cotgreave, “it depends.”

You should be prepared to apply all three approaches and choose the one that imparts the greatest understanding with the least amount of effort.

Note

I’ve had a number of debates with people about how I prefer to handle neutrals (half on the negative side and half on the positive side). If you find that troubling you can place the neutrals to one side, as shown in Figure 8.

Figure 8 -- Neutrals placed to one side providing a common baseline for comparison.

Figure 8 — Neutrals placed to one side providing a common baseline for comparison.

Aug 022017
 

August 3, 2017

In my last blog post I pointed out that I wish I had put BANs (big-ass numbers) in the Churn dashboard featured in chapter 24 of the book (see http://www.datarevelations.com/iterate.html.)

I had a similar experience this week when I revisited the Net Promoter Score dashboard from Chapter 17.  I’ve been reading Don Norman’s book The Design of Everyday Things and have been thinking about how to apply many of its principles to dashboard design.

On thing you can do to help users decode your work is to ditch the legend and add a color key to your dashboard title.

Here’s the Net Promoter Score dashboard as we present it in the book.  Notice the color legend towards the bottom right corner.

Figure 1 -- Net Promoter Score dashboard from The Big Book of Dashboards.

Figure 1 — Net Promoter Score dashboard from The Big Book of Dashboards.

Why did I place the legend out of the natural “flow” of how people would look at the dashboard? Why not just make the color coding part of the dashboard title, as shown below?

Figure 2 -- Making the color legend part of the title. 

Figure 2 — Making the color legend part of the title.

I’m not losing sleep over this as this is probably a dashboard that people will be looking at on a regular basis; that is, once they know what “blue” means they won’t  need to look at the legend.

But…

Every user will have his / her “first time” with a dashboard, so I recommend that wherever possible make the legend part of the “flow.” For example, instead of the legend being an appendage, off to the side of the dashboard…

Figure 3 -- Color legend as an appendage.

Figure 3 — Color legend as an appendage.

Consider making the color legend part of the title, as shown here.

Figure 4 -- Color coding integrated into the title.

Figure 4 — Color coding integrated into the title.

 

Jul 052017
 

The Importance of feedback, iteration, and constant improvement in data visualization (and finding people that will tell you when you are full of crap.)

July 5, 2017

Overview

People ask me how three opinionated people can write a book like The Big Book of Dashboards together.  Didn’t we disagree on things?  How were we able to work out our differences?

I can’t speak for Jeff Shaffer and Andy Cotgreave, but I’m very glad I found two fellow authors that would challenge every assertion I had, as it made for a much better book.

And why did it work?

It worked because we had one, overarching goal in common.

Clarity.

When people ask me about the process I think of a band breaking up because of “artistic differences.” That didn’t happen with the three of us because we weren’t trying to create art.  For certain, we wanted dashboards that were beautiful, but more than anything else we wanted dashboards that allow the largest number of people to get the greatest degree of understanding with a minimum amount of effort.

Let me take you through a case study on how the Churn dashboard came into fruition and how following the approach we used can help you make better dashboards.

Background

I had just finished presenting the third day of three days’ worth of intensive Tableau training when an attendee showed me a data set like the one below.

Figure 1 -- Subscribers gained and lost over time within different divisions

Figure 1 — Subscribers gained and lost over time within different divisions

I asked the attendee what she thought she needed to be able to show and she said it was important to know when and where things were really good (i.e., many more people signing up than cancelling) and where and when things were really bad (i.e., more people cancelling than signing up).

She also stressed that management would insist on seeing the actual numbers and not just charts.

This is not a horse, It’s a dashboard

Here’s a famous quote attributed to car designer Alec Issigonis:

“a camel is a horse designed by a committee.”

The main idea is that you will run into problems if you attempt to incorporate many people’s opinions into a single project.

This was not the case with the Churn dashboard as we received more input from more people over a longer period than any other dashboard in the book — and it resulted in a much better product than if I had just gone at it alone.

Let’s look at the evolution of the dashboard.

Churn, take one

Here’s an image of one of my first attempts to show what was happening for Division A.

Figure 2 -- Early attempt at showing churn.

Figure 2 — Early attempt at showing churn.

Starting with the left side of the top chart, we see a starting point for the month (0 for January, 30 for February, 20 for March, etc.) the number of people subscribing (the gray bars going up) and the number of people cancelling (the red bars going down).  It’s easy to see that I had more people subscribing than cancelling in January, and more people cancelling than subscribing in February.

The second chart shows the running sum over time.

Churn, takes two through fifty

Here’s a collage of some additional endeavors, many of which I didn’t even bother to share with others.

Figure 3 — A collage of multiple attempts to show churn.

Figure 3 — A collage of multiple attempts to show churn.

Most of my attempts were fashioned around some type of GANTT / Waterfall chart but one chart that showed promise for a small group of reviewers was a juxtaposed area chart, dubbed the “mountain” chart by one client who was kind enough to give me feedback.

Figure 4 -- The "mountain" chart.  Beware of dual axes charts.

Figure 4 — The “mountain” chart.  Beware of dual axes charts.

While some people “got” this most had a problem with the negative numbers (the cancellations depicted as red mountains) being displayed as a positive.  The idea was to allow people to see in which months the negatives exceeded the positives and you can in fact see this easily (February, May, and especially July).  But most people were simply confused, even after receiving an explanation of how the chart worked.

In addition, superimposing a second chart (in this case the running total line) almost always invites confusion as people must figure out how the axes work (e.g., “do the numbers on the left axis apply to the area chart or to the line?)

Getting closer, but Andy doesn’t buy it (and I’m glad he didn’t)

I thought I had a winner in the chart shown below.

Figure 5 -- Overly complicated waterfall chart.

Figure 5 — Overly complicated waterfall chart.

I showed this to Andy and he just didn’t buy it.  It was then that I realized that I had lost my “fresh eyes” and what was clear to me was not clear to somebody else, even somebody as adept at deciphering charts as Andy. Andy explained that he was having trouble with the spacing between charts and the running totals. It was just too hard for him to parse.

I took the feedback to heart and realized that the biggest problem was that there should be no spacing between the gray bar and the red bar for a particular month, but to get that clustering and spacing I would need to work around Tableau’s tendency not to cluster bar charts.

Fortunately, Jonathan Drummey had written a blog post on how to cajole Tableau into clustering the bars within each month together and I was able to fashion this view, which made it into the final dashboard.

Figure 6 -- Gains, losses, and running total, all in one reasonably easy-to-understand chart.

Figure 6 — Gains, losses, and running total, all in one reasonably easy-to-understand chart.

Note: I don’t expect people unfamiliar with this chart type to be able to read it without some instruction. As with a bullet chart and other chart types that people may never have seen before, when you publish a novel chart you will have to either sit down with the audience member or prepare a short video explaining how the pieces fit together.

Showing the details, but Jeff doesn’t buy it (and I’m glad he didn’t)

You may recall that one of the requirements is that people using the dashboard would need to see the numbers driving the chart. I suggested adding the text table shown below.

Figure 7 -- Details, details, details

Figure 7 — Details, details, details

When I showed this to Jeff there was a long pause, and then I recall him saying something along the lines that he didn’t think this added much to the analysis.  By this time I had worked with Jeff for well over a year and I knew that “I don’t think this adds much” was Jeff’s way of politely telling me that he hated that component of the dashboard.

I started to argue with him that there was a stated demand by the audience to show the actual numbers driving the charts when I realized that Jeff was in fact correct — just showing the numbers didn’t add much and there was a better way to meet the requirement and provide additional insight.

Use a highlight table.

Figure 8 -- D’oh! How did I miss this?  I’m usually the one yelling at people for just having a text table when they can instead have a value-added text table. Just look at July in the East!

Figure 8 — D’oh! How did I miss this?  I’m usually the one yelling at people for just having a text table when they can instead have a value-added text table. Just look at East in July!

I wish I had put in BANs!

I got a great deal from reviewing the dashboards other people submitted to the book and now wish I could go back in time and borrow some techniques from those dashboards and apply them to my own. Indeed, there isn’t one dashboard that I built for the book that I wouldn’t like to modify and that is certainly the case with the Churn dashboard.  Here’s the version that is in the book.

Figure 9 -- Churn dashboard, as shown in The Big Book of Dashboard

Figure 9 — Churn dashboard, as shown in The Big Book of Dashboards

Here’s the dashboard I would submit now.

Figure 10 -- Churn dashboard with BANs (Big-Ass Numbers)

Figure 10 — Churn dashboard with BANs (Big-Ass Numbers)

See the difference? There are BANs (Big-Ass Numbers) along the top and, as I’ve written previously, these elements can do a lot to help people understand key components of a dashboard: they can be conversation starters (and finishers), provide context to adjacent charts, and serve as a universal color legend.

Conclusion and Resources

If I could only make one recommendation on how to make better dashboards it would be to find people who will give you good, constructive feedback on whether what you’ve built is as clear as you think it is. Gird yourself for a lot of revisions and be prepared to add refinements, but it will be more than worth it.

Want to know more about the Churn dashboard? That chapter from The Big Book of Dashboards is available online at http://bit.ly/dashboardsbook

Do you work with Tableau and want to download the Churn packaged workbook?  You can download it from http://bigbookofdashboards.com/dashboards.html.

Want to purchase The Big Book of Dashboards? You can get it here.

Postscript: I asked Jeff and Andy to review this post before it went live.  Jeff had some ideas on how I might modify the BANs to make them clearer.  It never ends.

 Posted by on July 5, 2017 Blog 5 Responses »
May 102017
 

May 10, 2017

Overview

Most organizations want to wildly exceed customer expectations for all facets of all their products and services, but if your organization is like most, you’re not going to be able to do this. Therefore, how should you allocate money and resources?

First, make sure you are not putting time and attention into things that aren’t important to your customers and make sure you satisfy customers with the things that are important.

One way to do this is to create a survey that contains two parallel sets of questions that ask customers to indicate the importance of certain features / services with how satisfied they are with those products and services.  A snippet of what this might look like to a survey taker is shown in Figure 1.

Figure 1 -- How the importance vs. satisfaction questions might appear to the person taking the survey.

Figure 1 — How the importance vs. satisfaction questions might appear to the person taking the survey.

How to Visualize the Results

I’ve come up with a half dozen ways to show the results and will share three approaches in this blog post.  All three approaches use the concept of “Top 2 Boxes” where we compare the percentage of people who indicated Important or Very Important (the top two possible choices out of five for importance) and Satisfied or Very Satisfied (again, the top two choices for Satisfaction).

Bar-In-Bar Chart

Figure 2 shows a bar-in-bar chart, sorted by the items that are most important.

Figure 2 -- Bar-in-bar chart

Figure 2 — Bar-in-bar chart

This works fine, as would having a bar and a vertical reference line.

It’s easy to see that we are disappointing our customers in everything except the least important category and that the gap between importance and satisfaction is particular pronounced in Ability to Customer UI (we’re not doing so well in Response Time, 24-7 Support, and East of Use, either.)

Scatterplot with 45-degree line

Figure 3 shows a scatterplot that compares the percent top 2 boxes for Importance plotted against the percent top 2 boxes for Satisfaction where each mark is a different attribute in our study.

Figure 3 -- Scatterplot with 45-degree reference line

Figure 3 — Scatterplot with 45-degree reference line

The goal is to be as close to the 45-degree line as possible in that you want to match satisfaction with importance. That is, you don’t want to underserve customers (have marks below the line) but you probably don’t want to overserve, either, as marks above the line suggest you may be putting to many resources into things that are not that important to your customers.

As with the previous example it’s easy to see the one place where we are exceeding expectations and the three places where we’re quite a bit behind.

Dot Plot with Line

Of the half dozen or so approaches the one I like most is the connected dot plot, shown in Figure 4.

Figure 4 -- Connected dot plot. This is the viz I like the most.

Figure 4 — Connected dot plot. This is the viz I like the most.

(I placed “I like most” in italics because all the visualizations I’ve shown “work” and one of them might resonate more with your audience than this one.  Just because I like it doesn’t mean it will be the best for your organization so get feedback before deploying.)

In the connected dot plot the dots show the top 2 boxes for importance compared to the top 2 boxes for satisfaction.  The line between them underscores the gap.

I like this viz because it is sortable and easy to see where the gaps are most pronounced.

But what about a Divergent Stacked Bar Chart?

Yes, this is my “go to” viz for Likert-scale things and I do in fact incorporate such a view in the drill-down dashboard found at the end of this blog post. I did in fact experiment with the view but found that while it worked for comparing one feature at a time it was difficult to understand when comparing all 10 features (See Figure 5.)

Figure 5 -- Divergent stacked bar overload (too much of a good thing).

Figure 5 — Divergent stacked bar overload (too much of a good thing).

How to Build This — Make Sure the Data is Set Up Correctly

As with everything survey related, it’s critical that the data be set up properly. In this case for each Question ID we have something that maps that ID to a human readable question / feature and groups related questions together, as shown in Figure 6.

Figure 6 -- Mapping the question IDs to human readable form and grouping related questions

Figure 6 — Mapping the question IDs to human readable form and grouping related questions

Having the data set up “just so” allows us to quickly build a useful, albeit hard to parse, comparison of Importance vs. Satisfaction, as shown in Figure 7.

Figure 7 -- Quick and dirty comparison of importance vs. satisfaction.

Figure 7 — Quick and dirty comparison of importance vs. satisfaction.

Here we are just showing the questions that pertain to Importance and Satisfaction (1). Note that measure [Percentage Top 2 Boxes] that is on Columns (2) is defined as follows.

Figure 8 -- Calculated field for determining the percentage of people that selected the top 2 boxes.

Figure 8 — Calculated field for determining the percentage of people that selected the top 2 boxes.

Why >=3?  It turns out that the Likert scale for this data went from 0 to 4, so here we just want to add up everyone who selected a 3 or a 4.

Not Quite Ready to Rock and Roll

This calculated field will work for many of the visualizations we might want to create, but it won’t work for the scatterplot and it will give us some headaches when we attempt to add some discrete measures to the header that surrounds our chart (the % Diff text that appears to the left of the dot plot in Figure 4.) So, instead of having a single calculation I created two separate calculations to compute % top 2 boxes Importance and % top 2 boxes Satisfaction. The calculation for Importance is shown in Figure 9.

Figure 9 -- Calculated field for determining the percentage of folks that selected the top two boxes for Importance.

Figure 9 — Calculated field for determining the percentage of folks that selected the top two boxes for Importance.

Notice that we have all the rows associated with both the Importance questions and Satisfaction “in play”, as it were, but we’re only tabulating results for the Importance questions so we’re dividing by half of the total number of records.

We’ll need to create a similar calculated field for the Satisfaction questions.

Ready to Rock and Roll

Understanding the Dot Plot

Figure 10 shows what drives the Dot Plot (we’ll add the connecting line in a moment.)

Figure 10 -- Dissecting the Dot Plot.

Figure 10 — Dissecting the Dot Plot.

Here we see that we have a Shape chart (1) that will display two different Measure Values (2) and that Measure Names (3) is controlling Shape and Color.

Creating the Connecting Line Chart

Figure 11 shows how the Line chart that connects the shapes are built.

Figure 11 -- Dissecting the Line chart

Figure 11 — Dissecting the Line chart.

Notice that Measure Values is on Rows a second time (1) but the second instance the mark type is a Line (2) and that the end points are connected using the Measure Names on the Path (3).  Also notice that there is no longer anything controlling the Color as we want a line that is only one color.

Combining the Two Charts

The only thing we need to do now is combine the two charts into one by making a dual axis chart, to synchronize the secondary axis, and hide the secondary header (Figure 12.)

Figure 12 -- the Completed connected Dot Plot.

Figure 12 — the Completed connected Dot Plot.

What to Look for in the Dashboard

Any chart that answers a question usually fosters more questions. Consider the really big gap in Ability to Customize UI. Did all respondents indicate this, or only some?

And if one group was considerably more pronounced than others, what were the actual responses across the board (vs. just looking at the percent top 2 boxes)?

Figure 13 -- Getting the details on how one group responded

Figure 13 — Getting the details on how one group responded

The dashboard embedded below shows how you can answer these questions.

Got another approach that you think works better?  Let me know.

Apr 252017
 

April 25, 2017

Overview

I became a big fan of adding a marginal histogram to scatterplots when I first saw them applied in Tableau visualizations from Shine Pulikathara and Ben Jones.

For those not familiar with how these work, consider the scatterplot shown in Figure 1 that shows the relationship between salary and age.

Figure 1 -- Comparing Age and Salary on a scatterplot.

Figure 1 — Comparing Age and Salary on a scatterplot

Some interesting things here; for example,  we can see that salaries appear to be highest between ages 50 and 55 and lowest among the youngest and older workers.

But look what happens when we add marginal histograms to the x and y axes (Figure 2.)

Figure 2 -- Scatterplot with marginal histogram

Figure 2 — Scatterplot with marginal histogram

Whoa! The two bar charts to the right and below the main chart add a lot of insight into the data.  We don’t just see the correlations, but now we can also see age demographics and salary distribution in the organization.

Marginal Histograms and Jitterplots

The marginal histogram works with other visualizations as well. Consider the dot plot with jitter (jitterplot) example from Lean management tool innovator LeanKit in Figure 3.

Figure 3 -- Individual and aggregate vies of important data from LeanKit

Figure 3 — Individual and aggregate vies of important data from LeanKit

The combination of the individual data points (the jittered dots that represent Kanban cards) and the aggregated data (stacked bar charts) tells a more complete story than having only the aggregation or only the individual dots.

Marginal Histograms and Highlight Tables

Readers of this blog know I like highlight tables and often use them as a “visualization gateway drug” to move people from cross tabs to more insightful ways of looking at their data.

But as great as they are, they do not lend themselves to accurate comparisons of the data. Consider Figure 4 where we see the percentage of sales broken down by region.

Figure 4 -- Sorted highlight table showing percentage of sales by sub-category and region

Figure 4 — Sorted highlight table showing percentage of sales by sub-category and region

Yes, I can see that Phones in the East is a lot darker than Copiers in the West, but without the numbers there’s no way to could do an exact comparison as I don’t know of anyone that can look at just the color coding and exclaim “ah, that cell is twice as blue as that other cell.”

But look what happens when we add the marginal histogram to the visualization, as shown in Figure 5.

Figure 5 -- Sorted highlight table with marginal histograms. Here we see percentage of sales.

Figure 5 — Sorted highlight table with marginal histograms. Here we see percentage of sales.

So much added insight, and so little added screen real estate!

I’ll confess that the histograms don’t work quite as well if you have negative values. Here’s what it looks like if we look at percentage of profit broken down by sub-category and region.

Figure 6 -- Sorted highlight table with marginal histograms. Here we see percentage of profit.

Figure 6 — Sorted highlight table with marginal histograms. Here we see percentage of profit.

Because we have bars pointing in different directions for the histogram on the right the look isn’t quite as clean, but it certainly works.

See for Yourself

I’ve included an embedded dashboard below where you can experiment with different metrics and different sorting choices. Feel free to download and “look under the hood.”

Note that making this type of dashboard is not very difficult; the only tricky part is getting the three elements to align properly. Ben Jones gets into those particulars in his blog post.

 

Apr 052017
 

More thoughts on the Markimekko chart and in particular how to build one in Tableau.

April 4, 2017

Overview

Given my reluctance to embrace odd chart types and my conviction that I would find something better I was surprised to find myself last month writing about — and endorsing — the Marimekko chart.

If I was surprised then I’m absolutely gobsmacked to be writing about it again.

What precipitated all this was another very good example of the chart in the wild. After admiring it I couldn’t help but “look under the hood” (hey, we are talking about Tableau Public and people sharing this stuff freely) and I thought that the dashboard designer was working harder than he needed to build the visualization.

So, if people are going to use these things I thought I would share an alternative, and I think easier, technique for building them.

The Great Example from Neil Richards

Here’s the terrific Makeover Monday dashboard from Neil Richards where we see the likelihood of certain jobs being replaced by automation.

01_Neil

Neil does a great job highlighting some of the more interesting findings, but if you want to know more than what Neil highlights you’ll need to explore the dashboard on your own.

Notice that in both this case and in Emma Whyte’s we are dealing with only two data segments; e.g., male vs. female and at-risk vs. not at-risk jobs. Having only two colors is one of the main reasons why the chart works well.

Okay! Uncle! I agree that under the right conditions this is a useful chart and I can see what you may want to make one.

But is there an easier way to make one?

An Easier Way to Create a Markimekko Chart in Tableau

It turns out the same technique Joe Mako showed me six years ago for building a divergent stacked bar chart works great for fashioning a Markimekko.  Let’s see how to do this using Superstore data with fields similar to what was available in both Emma and Neil’s dashboards.

Let’s say I want to compare the magnitude of sales with the profitability of items by region.  Figure 2 shows the overall magnitude of sales but makes comparing profitability difficult.

Figure 2 -- Overall sales is easy to see but comparing profitability across regions is difficult.

Figure 2 — Overall sales is easy to see but comparing profitability across regions is difficult.

Here’s another attempt using a 100% stacked bar chart.

Figure 3 -- Showing profitability with a 100% stacked bar chart.

Figure 3 — Showing profitability with a 100% stacked bar chart.

Yes, this does a much better job allowing us to compare the profitability of each region, but there’s no way to easily glean that Sales in the West is almost double sales in the South (which is easy to do in Figure 2.)

So, how can we make the regions that have large sales be wide and the regions that have small sales be  narrow?

Understanding the Fields

Before going much further let’s make sure we understand the following three fields:

  • Percentage Profitable Sales
  • Percentage Unprofitable Sales
  • Sales Percentage of
[Percentage Profitable Sales]

This is defined as

SUM(IF [Profit]>=0 THEN [Sales] END)/SUM(Sales)

… and translates as “if the profit for an item within a partition is profitable, add it up, then divide by the total sales within the partition.”

This is the field that gives us the 90%, 77%, 76%, and 72% results shown in Figure 3.

[Percentage Unprofitable Sales]

This is defined as

1 - [Percentage of Profitable Sales]

… and gives us the 10%, 23%, 24%, ad 28% shown in Figure 3.

[Sales Percentage of]

This is defined as

SUM([Sales]) /TOTAL(SUM([Sales]))

… and we will use it to compute the percentage of sales across the four regions (i.e., show me the sales for one region divided by the sales for all the regions). Here’s how we might use it in a visualization.

Figure 4 -- Using the calculation to figure out how wide each region should be.

Figure 4 — Using the calculation to figure out how wide each region should be.

So, in Figure 4 we can see that the West segment is a lot thicker than the South segment.

How can we apply this additional depth to what we had in Figure 3?

Make it Easy to See if the Math is Correct

At this point it will be helpful to see the interplay of the various measures and dimensions using a cross tab like the one shown in Figure 5.

Figure 5 -- Cross tab showing the relationship among the different measures and dimensions.

Figure 5 — Cross tab showing the relationship among the different measures and dimensions.

The first four columns are easy to interpret:

“I see that sales in the West is $725,458 of which 10% is unprofitable and 90% is profitable.  That $725,458 represents 31.6% of the total sales.”

But how is the field called [Start at] defined and how are we going to use it?

Understanding [Start at]

[Start at] is defined as

PREVIOUS_VALUE(0)+ZN(LOOKUP([Sales Percentage of],-1))

This is the calculation that figures out where the mark should start while [Sales Percentage of] will later determine how thick the mark should be.  Let’s see how this all works together.

Figure 6 -- How [Start at] and [Sales Percentage of] will work together.  Note that “Compute Using” for the two table calculations is set to [Region].

Figure 6 — How [Start at] and [Sales Percentage of] will work together.  Note that “Compute Using” for the two table calculations is set to [Region].

For the West region we want to start at 0% and have a bar that is 31.6% units side. The function

PREVIOUS_VALUE(0)

Tells Tableau to look at whatever is the value for [Sales at] for the row above and if there is no row above make the value 0 (see Item 1 in Figure 6, above.)

Add to this the value for [Sales Percentage of] in the previous row (Item 2 which is also not present) and you get 0 + 0 (Item 3).

For the East region we want to start wherever West left off (Item 3 plus Item 4, which gives us item 5) and make the mark 29.5% wide (item 6).

For the Central region we want to start wherever the previous region left off (Item 5 plus item 6, which gives us item 7) and make the mark 21.8% wide (Item 8).

Let’s see how this all fits together into the Marimekko visualization in Figure 7.

Figure 7 -- Using [Start at ] and [Sales Percentage of] to make the Marimekko work.

Figure 7 — Using [Start at ] and [Sales Percentage of] to make the Marimekko work.

There are three things to keep in mind.

  1. [Start at] is on columns and determines the starting point (how far to the right) for each of the regions.
  2. [Sales Percentage of] is on Size and determines how thick the bars should be.
  3. Size is set to Fixed width, left aligned, where Fixed means the measure on the Size shelf is determining the thickness.
Figure 8 -- Size must be fixed and left-aligned.

Figure 8 — Size must be fixed and left-aligned.

Some Interesting Findings

I built a parameter-driven version of the Marimekko (embedded at the end of this blog post) that allows the viewer to select different dimensions and different ways to sort. Here’s what happens when we look at Sub-Category sorted by Profitability.

Figure 9 -- Profitability by Sub-Category.

Figure 9 — Profitability by Sub-Category.

Okay, not a big surprise here given how many visualizations we’ve all seen showing that Tables are problematic.

That said, I was in for a surprise when I broke this down by state and sorted by the magnitude of sales, as shown below.

Figure 10 -- Profitability by state, sorted by Sales.

Figure 10 — Profitability by state, sorted by Sales.

Wow, after 11 years of living with this data set I never realized that 60% of the items sold in Texas were unprofitable.  Who knew?

To be honest I’m not convinced we need a Marimekko to see this clearly.  A simple sorted bar chart will do the trick, as shown in Figure 11.

Figure 11 -- Sorted bar chart.

Figure 11 — Sorted bar chart.

Indeed, I think this very simple view is better than the Marimekko in many respects.

I guess it depends what you’re trying to get across.

See for Yourself

I’ve included an embedded workbook that has the Superstore example as well as versions of the visualizations Emma Whyte and Neil Richards built, but using this alternative technique.

I encourage you to think long and hard before deploying a Marimekko.  But if you do decide to build one I hope the techniques I explored here will prove useful.

 

Mar 202017
 

Or

How I stopped worrying and learned to love appreciate the Marimekko

March 19, 2017

Overview

Readers of my blog know that I suffer from what Maarten Lambrechts calls xenographphobia, the fear of unusual graphics.  I’ll encounter a chart type that I’ve not seen before, purse my lips, and think (smugly) that there is undoubtedly a better way to show the data than in this novel and, to me, unusual chart.

That was certainly my reaction to “Marimekko Mania” when Tableau 10.0 was first released. I didn’t see a solid use case for this chart. There were some wonderful blog posts from Jonathan Drummey and Bridget Cogley on the subject, but I just wasn’t buying the need for the chart type.

Note: It turns that for many situations you can make a perfectly fine Marimekko just using table calculations. I’ll weigh in on this later.

Enter Emma Whyte and Workout Wednesday

My “I’ll never need to use that” arrogance was disrupted a few weeks ago when I read this blog post from Emma Whyte.  The backstory is that Emma reviewed a Junk Charts makeover of a Wall Street Journal graphic, really liked the makeover, and decided to recreate it in Tableau.

Here’s the Wall Street Journal graphic.

Figure 1 -- Source of inspiration for Junk Charts  and Emma Whyte. From a 2016 survey by LeanIn.org and McKinsey & Co.

Figure 1 — Source of inspiration for Junk Charts  and Emma Whyte. From a 2016 survey by LeanIn.org and McKinsey & Co.

There are two important things the data is trying to tell us:

  1. The percentage of women decreases, a lot, the higher up you go in the corporate hierarchy; and,
  2. There are far more entry-level positions than there are managers than there are VPs, etc.

The chart does a good job on the first point but only uses text to covey the second point.

Contrast this with Emmy Whyte’s visualization:

Figure 2 -- Emma Whyte's makeover.

Figure 2 — Emma Whyte’s makeover.

Whoa.

I immediately “grokked” this.  There are way more men than women among VPs, Senior VPs, and in the C-Suite, but look how much narrower those bars are!  True, I cannot easily compare how much wider the Entry Level column is than the VP column, but is that really important?

Is the Marimekko in fact the “right” way to show this?

Being a little bit stubborn I was not ready to declare a Marimekko victory so I decided to see if I could build something that worked as well, if not better, using more common chart types.

Anything You Can Do, I Can Do…

I won’t go through all ten iterations I came up with but I will show some of my attempts to convey the data accurately and with the visceral wallop I get from Emma’s makeover.

100% Stacked Bar with Marginal Histogram

Putting a histogram in the margin has become a “go to” technique when I’m dealing with highlight tables and scatterplots so I thought that might work in this situation. Here’s a 100% stacked bar chart combined with a histogram.

Figure 3 -- 100% stacked bar with marginal histogram. 

Figure 3 — 100% stacked bar with marginal histogram.

I was so convinced this would just smoke the Marimekko. I mean just look how easy it is to make accurate comparisons!

That may be true, but I think the Marimekko in question does a better job.

Connected Dot Plot

Here’s another attempt using a connected dot plot.

Figure 4 -- Connected dot plot where the size of the circles reflects the percentage of the workforce.

Figure 4 — Connected dot plot where the size of the circles reflects the percentage of the workforce.

Here the lines separating the circles show the gender gap and the size of the circles reflects the percentage of the workforce.

OK, I think the gap is well represented but the spacing between job levels is a fixed width.  In my pursuit of accuracy I needed to find a way spread the circles based on percentage of the workforce.

Diverging Lines with Bands

Figure 5 shows two diverging lines with circles and bands that are proportionate to the percentage of the workforce (Entry level is 52 units wide, Manager is 28 units wide, and so on).

Figure 5 -- Diverging lines with dots and correctly-sized circles and bands

Figure 5 — Diverging lines with dots and correctly-sized circles and bands

But why are the lines sloping?  Shouldn’t the lines be flat for each job level?

Flat Lines

Here’s a similar approach but where the lines stay flat for each job level.

Figure 6 -- Flat lines and accurate circles and bands.

Figure 6 — Flat lines and accurate circles and bands.

More Approaches and the Graphic from the Actual Report

All told I made ten attempts.  The calculation I came up with for Figure 5 also made it possible to create a Markimekko just using a simple table calculation.

Note: I asked Jonathan Drummey to have a look at the Marimekko-with-table-calc approach and he points out that in both my example and Emma Whyte’s example the data isn’t “dense” so you can break the visualization simply by right-clicking a mark and selecting Exclude. That said, the technique is fine for static images and dashboards where you disable the Exclude functionality.

I also reviewed the full Women in the Workplace report and saw they used an interesting pipeline chart to relate the data.

Figure 7 -- "Pipeline" chart from Women in Workplace report (LeanIn.Org and McKinsey & Co.)

Figure 7 — “Pipeline” chart from Women in Workplace report (LeanIn.Org and McKinsey & Co.)

I applaud the creativity but have a lot of problems with the inaccurate proportions. Notice that this chart also has a sloping line suggesting a continuous decrease as you go from one level to another.

And The Winner is…

For me, Emma Whyte’s Marimekko does the best job of showing the data in a compelling and accurate format and I thank Emma for presenting such a worthwhile example.

Will I use this chart type in my practice?

It depends.

If the situation calls for it, I would try it along with other approaches and see what works best for the intended audience.

Here’s a link to the Tableau workbook that contains a copy of Emma Whyte’s original approach and many of my attempts to improve upon it. If you come up with an alternative approach that you think works well, please let me know.

Postscript

Big Book of Dashboards co-author Jeff Shaffer encouraged me to make some more attempts. Here’s a work in progress using jittering.

Jitter with bands

I think this looks promising.