Jan 032018
 

January 3, 2018

In the first part we saw how to “hard code” the creation of a dimension from row-based survey data using a Level-of-Detail expression. In this second part we’ll see how we can turn any row-based survey data into a dimension “on the fly” using a parameter.

Where we left off

You may recall the calculation that would allow people to cut / filter by the question “do you plan to vote in the next election” was this:

{FIXED [Resp ID]: MAX(IF [Question ID]="Q0" then [Labels] END)}

So, how do we take this “one-off” calculation and make it flexible?

Making a flexible, extensible solution

One approach is to create a parameter called [Question ID Param] and seeding it with values from the Question ID field.  The resulting calculation would look like this:

{FIXED [Resp ID]: MAX(IF [Question ID]=[Question ID Param]
then [Labels] END)}

The problem is that your “friendly” parameter list will look like this:

Figure 1 -- Not-very-friendly question list.

Figure 1 — Not-very-friendly question list.

Quick! What does “Q34-SAT” stand for?

Maybe we should instead populate the parameter list from the human-readable form of the Question ID, in this case the [Wording] field. Here’s what that will look like.

Figure 2 -- Friendlier list, but still many problems.

Figure 2 — Friendlier list, but still many problems.

This is certainly an improvement, but there are two big problems with this.  The first is that there may be Question IDs that in fact have the same wording. Indeed, if your survey delves into comparing how important a list of features is with how satisfied people are with those features, you will need for the wording to be identical. For example, “Price” could refer to “How important is this to you?” or it could refer to “How satisfied are you with this?” (See this blog post for a discussion of visualizing importance vs. satisfaction.)

The second problem is that the questions are in alphabetical order, so you have three importance / satisfactions question, followed by three check-all-that-apply questions, followed by a handful of Likert-scale questions, followed by another importance / satisfaction question that should be with the first group, etc.

Combing Grouping with Wording

Jonathan Drummey came up with a very easy way to both disambiguate the satisfaction from the importance question and group the questions in the parameter list logically. The trick is to create a new field (we’ll call it [Question Parameter List]) that concatenates the question grouping and the question wording. We will define is as follows.

[Question Grouping] + ' / ' + [Wording]

This creates a list that looks like this.

Figure 3 -- Results of concatenating [Question Grouping] with [Wording].

Figure 3 — Results of concatenating [Question Grouping] with [Wording].

We’re almost done, we next need to create a new parameter, we’ll call it [Question to compare], and we’ll populate it with the members of the [Question Parameter List] field, as shown below.

Figure 4 -- Our friendly, disambiguated, logically-grouped list of questions.

Figure 4 — Our friendly, disambiguated, logically-grouped list of questions.

Armed with this concatenated list we can modify our hard-coded LoD expression field so that it looks like this.

{FIXED [Resp ID]: MAX(IF [Wording Parameter List]=
[Question to compare] then [Labels] END)}

Before you explore the downloadable dashboard at the end of this post I want to dissuade you from inflicting this “filter any question by any other question” functionality on your audience as you will simply be hitting them with too much flexibility. While I’m sure there are some insights to be gleaned from some of the question combinations, there are probably dozens, if not hundreds, that won’t yield anything useful. Do you really want to make your audience find where the good stuff is?

Cole Nussbaumer Knaflic presents a wonderful one-day workshop around her book, Storytelling with Data. In the workshop she states that finding good insights buried in a mound of data is like having to shuck a lot of oysters to find a pearl. Don’t show your audience all the oysters you shucked (and certainly don’t make them shuck the oysters!); just show them the pearl.

Yes, you should use this technique to find insights that go beyond cutting the data by traditional demographic questions.  And if / when you find something useful, limit what you show your audience to just those filters / options that provide insight.

One last thought: if you are building a “this-has-to-be-slick” dashboard — perhaps one that is customer-facing — consider ditching the single concatenated parameter and instead building a parent / child pair of parameters using Tableau’s Javascript API. The first parameter would show the question grouping and the second would show the question in human-readable form based on what was selected from the first parameter.

 Posted by on January 3, 2018 1) General Discussions, 2) Visualizing Survey Data, Blog Tagged with: , ,  1 Response »
Dec 182017
 

December 18, 2017

Special thanks to Shaelyn McCole at Hootology who suggested the topic, Ryan Gensel at CPI who came up with a wonderful enhancement to the connected dot plot, and Jeffrey Shaffer at Data Plus Science who suggested some cosmetic improvements.

Overview

If you have filters in your dashboards you’ve probably had a thought similar to this: yes, it’s great that I can filter the results, but what did the dashboard look like before I applied the filter?  I can’t remember if the bars were smaller or larger, let alone by how much.

Consider the example below that shows the results to a multi-select survey question were we see results from all respondents that answered a set of questions.

Figure 1 -- Results from all survey respondents.

Figure 1 — Results from all survey respondents.

Now compare this with results with women from North America.

Figure 2 -- Results from female respondents who live in North America.

Figure 2 — Results from female respondents who live in North America.

Unless you process information and memories differently than the vast majority of people it’s very hard to compare the two populations and harder still if you can only see one result at a time (here you can at least scroll up and down to compare).

So, is there an easy way in Tableau to display both filtered and unfiltered results in the same visualization?

The answer is a resounding yes, with a refreshingly straightforward Level-of-Detail (LoD) calculation.

Determining the calculations

Let’s first look at the calculation we need to determine the percentage of people that checked “Yes” to a question. The “pill arrangement” in Tableau is shown below.

Figure 3 -- Pill arrangement for a check-all-that-apply survey question.

Figure 3 — Pill arrangement for a check-all-that-apply survey question.

Notice that we have Wording on the rows shelf. The Question Grouping filter is limiting the results to only those rows that have responses for the survey questions that interest us. Notice also the filters for Gender, Generation, and Location.

Here’s the calculation that determine the percentage of people that selected an item.

SUM([Vales]) / SUM([Number of Records])

As the survey has been coded so that folks that selected an item have a Value of 1 and those that did not select an item have a Value of 0, this calculation add up all the ones and divide by the number of people that answered the question.

Okay, so we have the calculation that will adjust the bars as we change the settings for the demographic filters.

What is the calculation that will keep the bars the same length even if we change the demographic filters?

All we need to do is lock the results to the Wording field using the following LoD calculation:

{FIXED [Wording]: SUM([Value])/SUM([Number of Records])}

This tells Tableau even though there may be filters, ignore them fix this to all the rows associated with the [Wording] field.  For those of you wondering, you could also fix this to the [Question ID] field and you would get the same results.

So, does this work?

Consider the pill configuration below where we can compare, one below the other, the results for the Filtered population and the Entire population. When all options in the demographic filers are checked, the bars are the same length.

Figure 4 -- Filtered vs. Unfiltered results.  Right now, with everything selected, the bars are the same length.

Figure 4 — Filtered vs. Unfiltered results.  Right now, with everything selected, the bars are the same length.

Now let’s see what happens when we apply some filters (keep you eyes on Adrenaline Production for the Entire Population, currently at 76%).

Figure 5 -- One set of bars changes but the other remains fixed

Figure 5 — One set of bars changes but the other remains fixed

Notice that one set of bars change, but the bars for the Entire Population (the ones using the LoD calculation) don’t change.

A gap chart… with a twist!

I experimented with several ways to visualize the differences and settled on a gap chart (also called a dumbbell chart, also called a connected dot plot). Let’s see how to build the chart (and how to improve it, using color coding that Ryan Gensel suggests).

Let’s start with the dots.

Figure 6 -- Dot plot using Measure Names and Measure Values

Figure 6 — Dot plot using Measure Names and Measure Values

This is very similar to the bar chart we had earlier but instead of bars we are using circles and we’ve placed Measure Names on Color so the circles are different colors.

We now need to create a line that connect the two dots on each row.

First, we will duplicate the Measure Values pill that is on columns and have two identical charts, side-by-side, as shown here.

Figure 7 -- Duplicating Measure Values

Figure 7 — Duplicating Measure Values

Next, for the second instance of Measure Values we will change the Mark type to Line and move Measure Names from Color to Path, as shown below.

Figure 8 -- Making the second instance of Measure Values a Line chart, connecting the two measures by placing Measure Names on Path

Figure 8 — Making the second instance of Measure Values a Line chart, connecting the two measures by placing Measure Names on Path

Now we’ll right-click the second Measure Values pill and select Dual Axis, then we’ll right-click the secondary axis and select Synchronize Axis, then we’ll right-click the secondary axis again and de-select Show Header, giving us a chart that looks like this.

09_NiceDumbbell

Making the difference stand out

I was chatting with CPI’s Ryan Gensel, one of he authors of the Agency Utilization Dashboard that appears in The Big Book of Dashboards.  Ryan told me about a technique he was using that colors the line based on which dot has the larger value.

Here I apply the technique by creating a field called Line Color which is defined as follows.

IF [Filtered -- % Check all that apply]>
SUM([Entire Population -- % Check all that apply]) THEN
"Filtered"
ELSE
"Entire Population"
END

We can then place this field on Color for the Line chart instance of Measure Values.  We will also need to make the size of the line a little thicker by clicking the Size button and moving the slider to the right. This yields a chart that looks like this.

Figure 10 -- A better gap chart.

Figure 10 — A better gap chart.

I’ve embedded the working dashboard below.  Please feel free download and explore the dashboard as well as some alternative approaches saved in the workbook.

 

 

Nov 262017
 

November 25, 2017

Special thanks to Jeffrey Shaffer, Andy Cotgreave, and Rody Zakovich for feedback that helped improve the dashboard that appears at the end of this post.

Overview

It seems I’m not the only person who has been thinking about stacked bar charts (see posts from Cole Nussbaumer Knaflic, Jonathan Schwabish, and Andy Cotgreave.)

My problem with these charts, and their first cousin, the area chart, is that the many people who design them don’t understand the possible pitfalls and end up creating charts that are attractive but that don’t convey a lot of useful information.

In this blog post we’ll see examples of where stacked bar and area charts work, where they fail, and what you can do to add some functionality to your dashboards so that if you do use stacked bar and area charts they will work better.

The people who market data viz tools love these charts

Some of the chief culprits include the data visualization vendors themselves who sometimes fashion “screaming cat” visualizations like these in their marketing materials and promotions.

01_StackedBars_Cat

Figure 1 — Sample dashboard Tableau uses to showcase its extensions API.

Figure 2 -- Microsoft PowerBI dashboard.

Figure 2 — Microsoft PowerBI dashboard.

Figure 3 -- Area chart from Tableau's home page

Figure 3 — Area chart from Tableau’s home page

I’ll admit the last one looks particularly cool, but do you have any inkling what it’s trying to show you?

Before we get into exactly what’s wrong with the charts (and how to fix them) let’s look at a couple of examples that work very well.

Some good examples

Here’s an example from The Big Book of Dashboards’ Complaints dashboard.

Figure 4 -- A portion of the Complaints Dashboard showing open, closed, and overall complaints (Dashboard by Jeffrey Shaffer).

Figure 4 — A portion of the Complaints Dashboard showing open, closed, and overall complaints (Dashboard by Jeffrey Shaffer).

With this chart it’s very easy for me to see the total number of complaints (overall length of blue bars plus red bars) as well as compare the number of open complaints (red bars) because there are only two colors and the items I want to compare are open complaints (red) and total complaints (red plus blue), both of which have a common baseline.

Another example comes from Matt Chambers’ Mayweather vs. McGregor fight analysis dashboard.

Figure 5 -- Stacked bar chart comparing overall punches and punches that landed between Mayweather and McGregor

Figure 5 — Stacked bar chart comparing overall punches and punches that landed between Mayweather and McGregor

You should check out the complete dashboard, but this stacked bar chart gets to the heart of why Mayweather won the fight: McGregor exerted more effort in launching 430 punches vs. Mayweather’s 320, but far fewer of McGregor’s punches landed (111 to 170).

As you consider why this chart is so effective notice that we only care about two things — the punches that landed and the total number of punches.

So, why do I like these two examples, but cite the earlier dashboards as “screaming cats”? It has to do with how many segments there are, and which segment is along the baseline.

Let’s explore a bit.

Understanding the strengths and weaknesses of stacked bar charts

Consider the chart shown below.

Figure 6 -- Typical stacked bar chart. We can make accurate comparisons of overall and of the first category (Central), but nothing else.

Figure 6 — Typical stacked bar chart. We can make accurate comparisons of overall and of the first category (Central), but nothing else.

I can see that Phones has more sales overall (1), that Chairs is the biggest seller in the Central region (2) and the Bookcases is the lowest seller in the Central region (3).  If that’s all that is important then we may be all done here (although it is hard to see that Bookcases is in fact less wide than Machines… more on that in a moment.)

But suppose I want to know what were the three lowest selling categories in the Central region, or if I wanted to easily compare sales in the East or West? In these cases this visualization isn’t much help and *that’s* the biggest problem with stacked bar and area charts: You can only accurately compare overall values and the one region that hugs the baseline.

Adding functionality — sorting and focus

Let’s address the “what were the three lowest sellers in the Central region?” question first. One way to do this would be to have a widget on your dashboard that allows you to sort by both total sales and by sales for a particular region. Here’s what the sort would look like for the Central region.

Figure 7 -- Bars sorted by Central region. Now it's easy to see which where the top and bottom sellers in that region.

Figure 7 — Bars sorted by Central region. Now it’s easy to see which where the top and bottom sellers in that region.

Ah, now we can easily answer the question “what were the bottom three sellers in the Central region?” They are Accessories, Machines, and Bookcases.

This is great if all you care about is the Central region, but suppose you wanted co compare sales in the South?  With the way the chart is configured above this is very difficult, but if you add a “widget” that allows your audience to select a region to focus on, the chart can easily answer the question.

Figure 8 -- Adding some functionality to the visualization so the audience can move a selected region to the baseline and sort by that region.

Figure 8 — Adding some functionality to the visualization so the audience can move a selected region to the baseline and sort by that region.

The “Focus on” parameter allows the user to select which region gets placed along the baseline and the “Sort Bar Cart by” parameter allows the user to sort either by the Selected region or by overall.

But, if what we’re interested in is showing how one region compares with itself and overall, why bother to have the other regions as different colored bars?  That is, why not make the two things we care about — overall and the region in question — stand out more?

Highlighting the selected region

My fellow author Jeff Shaffer suggested I add this functionality to the visualization and I think it’s a terrific addition. Let’s see how much easier it is to focus on the two main questions (overall and the Selected region) when we mute the colors that aren’t stacked along the baseline.

Here’s the results when we sort by the selected region.

Figure 9 -- Stacked bar chart with muted colors sorted by Selected region.

Figure 9 — Stacked bar chart with muted colors sorted by Selected region.

And here are the results when we sort by overall sales.

Figure 10 -- Stacked bar chart with muted colors sorted by overall sales.

Figure 10 — Stacked bar chart with muted colors sorted by overall sales.

What about Area charts?

You’ll need to address the same issues with area charts as you can only make accurate comparisons for totals and for segments that hug the baseline, as shown below.

Figure 11 -- Area chart showing sales over time. Note that we can compare overall sales and sales in the West as there is common baseline.

Figure 11 — Area chart showing sales over time. Note that we can compare overall sales and sales in the West as there is common baseline.

Note that because we are not including the product sub-categories the sorting feature is not needed.

100% stacked bar charts

Whereas the regular stacked bar chart allows you to make accurate comparisons of overall sales and one region at a time, a 100% stacked bar chart will allow you to accurately compare the two outer regions, as we can see below.

Figure 12 -- 100% stacked bar chart. We can compare the outer regions (Central and West) because there is a common baseline

Figure 12 — 100% stacked bar chart. We can compare the outer regions (Central and West) because there is a common baseline

Buy we can’t accurately compare the inner regions:

Figure 13 -- 100% stacked bar chart. We cannot compare the inner regions (East and South) because the elements are floating; there isn’t a common baseline.

Figure 13 — 100% stacked bar chart. We cannot compare the inner regions (East and South) because the elements are floating; there isn’t a common baseline.

Give the dashboard a try

The dashboard below allows you to explore the functionality discussed in this post. Please note that I’m not suggesting you should include all the widgets in the dashboard. Indeed, maybe this is something you use on our own to help curate interesting findings in your data that you then highlight in a presentation or using Storypoints.

As for how to build all this functionality into Tableau, if you download and the workbook and look under the hood you’ll see there’s nothing terribly complicated going on (indeed, there isn’t one LOD calc). That said, my solution is not very robust — it’s hard-coded to only show the four known regions that are currently in the data set. I’m sure with a bit more effort one could fashion something extensible but for this blog post I wanted to prototype the functionality, not craft a robust solution.

Parting thoughts: Do make sure to check out this post where Rody Zakovich applies a different approach to looking at overall and segmented sales for individual customers.

 

Aug 022017
 

August 3, 2017

In my last blog post I pointed out that I wish I had put BANs (big-ass numbers) in the Churn dashboard featured in chapter 24 of the book (see http://www.datarevelations.com/iterate.html.)

I had a similar experience this week when I revisited the Net Promoter Score dashboard from Chapter 17.  I’ve been reading Don Norman’s book The Design of Everyday Things and have been thinking about how to apply many of its principles to dashboard design.

On thing you can do to help users decode your work is to ditch the legend and add a color key to your dashboard title.

Here’s the Net Promoter Score dashboard as we present it in the book.  Notice the color legend towards the bottom right corner.

Figure 1 -- Net Promoter Score dashboard from The Big Book of Dashboards.

Figure 1 — Net Promoter Score dashboard from The Big Book of Dashboards.

Why did I place the legend out of the natural “flow” of how people would look at the dashboard? Why not just make the color coding part of the dashboard title, as shown below?

Figure 2 -- Making the color legend part of the title. 

Figure 2 — Making the color legend part of the title.

I’m not losing sleep over this as this is probably a dashboard that people will be looking at on a regular basis; that is, once they know what “blue” means they won’t  need to look at the legend.

But…

Every user will have his / her “first time” with a dashboard, so I recommend that wherever possible make the legend part of the “flow.” For example, instead of the legend being an appendage, off to the side of the dashboard…

Figure 3 -- Color legend as an appendage.

Figure 3 — Color legend as an appendage.

Consider making the color legend part of the title, as shown here.

Figure 4 -- Color coding integrated into the title.

Figure 4 — Color coding integrated into the title.

 

Apr 252017
 

April 25, 2017

Overview

I became a big fan of adding a marginal histogram to scatterplots when I first saw them applied in Tableau visualizations from Shine Pulikathara and Ben Jones.

For those not familiar with how these work, consider the scatterplot shown in Figure 1 that shows the relationship between salary and age.

Figure 1 -- Comparing Age and Salary on a scatterplot.

Figure 1 — Comparing Age and Salary on a scatterplot

Some interesting things here; for example,  we can see that salaries appear to be highest between ages 50 and 55 and lowest among the youngest and older workers.

But look what happens when we add marginal histograms to the x and y axes (Figure 2.)

Figure 2 -- Scatterplot with marginal histogram

Figure 2 — Scatterplot with marginal histogram

Whoa! The two bar charts to the right and below the main chart add a lot of insight into the data.  We don’t just see the correlations, but now we can also see age demographics and salary distribution in the organization.

Marginal Histograms and Jitterplots

The marginal histogram works with other visualizations as well. Consider the dot plot with jitter (jitterplot) example from Lean management tool innovator LeanKit in Figure 3.

Figure 3 -- Individual and aggregate vies of important data from LeanKit

Figure 3 — Individual and aggregate vies of important data from LeanKit

The combination of the individual data points (the jittered dots that represent Kanban cards) and the aggregated data (stacked bar charts) tells a more complete story than having only the aggregation or only the individual dots.

Marginal Histograms and Highlight Tables

Readers of this blog know I like highlight tables and often use them as a “visualization gateway drug” to move people from cross tabs to more insightful ways of looking at their data.

But as great as they are, they do not lend themselves to accurate comparisons of the data. Consider Figure 4 where we see the percentage of sales broken down by region.

Figure 4 -- Sorted highlight table showing percentage of sales by sub-category and region

Figure 4 — Sorted highlight table showing percentage of sales by sub-category and region

Yes, I can see that Phones in the East is a lot darker than Copiers in the West, but without the numbers there’s no way to could do an exact comparison as I don’t know of anyone that can look at just the color coding and exclaim “ah, that cell is twice as blue as that other cell.”

But look what happens when we add the marginal histogram to the visualization, as shown in Figure 5.

Figure 5 -- Sorted highlight table with marginal histograms. Here we see percentage of sales.

Figure 5 — Sorted highlight table with marginal histograms. Here we see percentage of sales.

So much added insight, and so little added screen real estate!

I’ll confess that the histograms don’t work quite as well if you have negative values. Here’s what it looks like if we look at percentage of profit broken down by sub-category and region.

Figure 6 -- Sorted highlight table with marginal histograms. Here we see percentage of profit.

Figure 6 — Sorted highlight table with marginal histograms. Here we see percentage of profit.

Because we have bars pointing in different directions for the histogram on the right the look isn’t quite as clean, but it certainly works.

See for Yourself

I’ve included an embedded dashboard below where you can experiment with different metrics and different sorting choices. Feel free to download and “look under the hood.”

Note that making this type of dashboard is not very difficult; the only tricky part is getting the three elements to align properly. Ben Jones gets into those particulars in his blog post.

 

Apr 052017
 

More thoughts on the Markimekko chart and in particular how to build one in Tableau.

April 4, 2017

Overview

Given my reluctance to embrace odd chart types and my conviction that I would find something better I was surprised to find myself last month writing about — and endorsing — the Marimekko chart.

If I was surprised then I’m absolutely gobsmacked to be writing about it again.

What precipitated all this was another very good example of the chart in the wild. After admiring it I couldn’t help but “look under the hood” (hey, we are talking about Tableau Public and people sharing this stuff freely) and I thought that the dashboard designer was working harder than he needed to build the visualization.

So, if people are going to use these things I thought I would share an alternative, and I think easier, technique for building them.

The Great Example from Neil Richards

Here’s the terrific Makeover Monday dashboard from Neil Richards where we see the likelihood of certain jobs being replaced by automation.

01_Neil

Neil does a great job highlighting some of the more interesting findings, but if you want to know more than what Neil highlights you’ll need to explore the dashboard on your own.

Notice that in both this case and in Emma Whyte’s we are dealing with only two data segments; e.g., male vs. female and at-risk vs. not at-risk jobs. Having only two colors is one of the main reasons why the chart works well.

Okay! Uncle! I agree that under the right conditions this is a useful chart and I can see what you may want to make one.

But is there an easier way to make one?

An Easier Way to Create a Markimekko Chart in Tableau

It turns out the same technique Joe Mako showed me six years ago for building a divergent stacked bar chart works great for fashioning a Markimekko.  Let’s see how to do this using Superstore data with fields similar to what was available in both Emma and Neil’s dashboards.

Let’s say I want to compare the magnitude of sales with the profitability of items by region.  Figure 2 shows the overall magnitude of sales but makes comparing profitability difficult.

Figure 2 -- Overall sales is easy to see but comparing profitability across regions is difficult.

Figure 2 — Overall sales is easy to see but comparing profitability across regions is difficult.

Here’s another attempt using a 100% stacked bar chart.

Figure 3 -- Showing profitability with a 100% stacked bar chart.

Figure 3 — Showing profitability with a 100% stacked bar chart.

Yes, this does a much better job allowing us to compare the profitability of each region, but there’s no way to easily glean that Sales in the West is almost double sales in the South (which is easy to do in Figure 2.)

So, how can we make the regions that have large sales be wide and the regions that have small sales be  narrow?

Understanding the Fields

Before going much further let’s make sure we understand the following three fields:

  • Percentage Profitable Sales
  • Percentage Unprofitable Sales
  • Sales Percentage of
[Percentage Profitable Sales]

This is defined as

SUM(IF [Profit]>=0 THEN [Sales] END)/SUM(Sales)

… and translates as “if the profit for an item within a partition is profitable, add it up, then divide by the total sales within the partition.”

This is the field that gives us the 90%, 77%, 76%, and 72% results shown in Figure 3.

[Percentage Unprofitable Sales]

This is defined as

1 - [Percentage of Profitable Sales]

… and gives us the 10%, 23%, 24%, ad 28% shown in Figure 3.

[Sales Percentage of]

This is defined as

SUM([Sales]) /TOTAL(SUM([Sales]))

… and we will use it to compute the percentage of sales across the four regions (i.e., show me the sales for one region divided by the sales for all the regions). Here’s how we might use it in a visualization.

Figure 4 -- Using the calculation to figure out how wide each region should be.

Figure 4 — Using the calculation to figure out how wide each region should be.

So, in Figure 4 we can see that the West segment is a lot thicker than the South segment.

How can we apply this additional depth to what we had in Figure 3?

Make it Easy to See if the Math is Correct

At this point it will be helpful to see the interplay of the various measures and dimensions using a cross tab like the one shown in Figure 5.

Figure 5 -- Cross tab showing the relationship among the different measures and dimensions.

Figure 5 — Cross tab showing the relationship among the different measures and dimensions.

The first four columns are easy to interpret:

“I see that sales in the West is $725,458 of which 10% is unprofitable and 90% is profitable.  That $725,458 represents 31.6% of the total sales.”

But how is the field called [Start at] defined and how are we going to use it?

Understanding [Start at]

[Start at] is defined as

PREVIOUS_VALUE(0)+ZN(LOOKUP([Sales Percentage of],-1))

This is the calculation that figures out where the mark should start while [Sales Percentage of] will later determine how thick the mark should be.  Let’s see how this all works together.

Figure 6 -- How [Start at] and [Sales Percentage of] will work together.  Note that “Compute Using” for the two table calculations is set to [Region].

Figure 6 — How [Start at] and [Sales Percentage of] will work together.  Note that “Compute Using” for the two table calculations is set to [Region].

For the West region we want to start at 0% and have a bar that is 31.6% units side. The function

PREVIOUS_VALUE(0)

Tells Tableau to look at whatever is the value for [Sales at] for the row above and if there is no row above make the value 0 (see Item 1 in Figure 6, above.)

Add to this the value for [Sales Percentage of] in the previous row (Item 2 which is also not present) and you get 0 + 0 (Item 3).

For the East region we want to start wherever West left off (Item 3 plus Item 4, which gives us item 5) and make the mark 29.5% wide (item 6).

For the Central region we want to start wherever the previous region left off (Item 5 plus item 6, which gives us item 7) and make the mark 21.8% wide (Item 8).

Let’s see how this all fits together into the Marimekko visualization in Figure 7.

Figure 7 -- Using [Start at ] and [Sales Percentage of] to make the Marimekko work.

Figure 7 — Using [Start at ] and [Sales Percentage of] to make the Marimekko work.

There are three things to keep in mind.

  1. [Start at] is on columns and determines the starting point (how far to the right) for each of the regions.
  2. [Sales Percentage of] is on Size and determines how thick the bars should be.
  3. Size is set to Fixed width, left aligned, where Fixed means the measure on the Size shelf is determining the thickness.
Figure 8 -- Size must be fixed and left-aligned.

Figure 8 — Size must be fixed and left-aligned.

Some Interesting Findings

I built a parameter-driven version of the Marimekko (embedded at the end of this blog post) that allows the viewer to select different dimensions and different ways to sort. Here’s what happens when we look at Sub-Category sorted by Profitability.

Figure 9 -- Profitability by Sub-Category.

Figure 9 — Profitability by Sub-Category.

Okay, not a big surprise here given how many visualizations we’ve all seen showing that Tables are problematic.

That said, I was in for a surprise when I broke this down by state and sorted by the magnitude of sales, as shown below.

Figure 10 -- Profitability by state, sorted by Sales.

Figure 10 — Profitability by state, sorted by Sales.

Wow, after 11 years of living with this data set I never realized that 60% of the items sold in Texas were unprofitable.  Who knew?

To be honest I’m not convinced we need a Marimekko to see this clearly.  A simple sorted bar chart will do the trick, as shown in Figure 11.

Figure 11 -- Sorted bar chart.

Figure 11 — Sorted bar chart.

Indeed, I think this very simple view is better than the Marimekko in many respects.

I guess it depends what you’re trying to get across.

See for Yourself

I’ve included an embedded workbook that has the Superstore example as well as versions of the visualizations Emma Whyte and Neil Richards built, but using this alternative technique.

I encourage you to think long and hard before deploying a Marimekko.  But if you do decide to build one I hope the techniques I explored here will prove useful.

 

Mar 202017
 

Or

How I stopped worrying and learned to love appreciate the Marimekko

March 19, 2017

Overview

Readers of my blog know that I suffer from what Maarten Lambrechts calls xenographphobia, the fear of unusual graphics.  I’ll encounter a chart type that I’ve not seen before, purse my lips, and think (smugly) that there is undoubtedly a better way to show the data than in this novel and, to me, unusual chart.

That was certainly my reaction to “Marimekko Mania” when Tableau 10.0 was first released. I didn’t see a solid use case for this chart. There were some wonderful blog posts from Jonathan Drummey and Bridget Cogley on the subject, but I just wasn’t buying the need for the chart type.

Note: It turns that for many situations you can make a perfectly fine Marimekko just using table calculations. I’ll weigh in on this later.

Enter Emma Whyte and Workout Wednesday

My “I’ll never need to use that” arrogance was disrupted a few weeks ago when I read this blog post from Emma Whyte.  The backstory is that Emma reviewed a Junk Charts makeover of a Wall Street Journal graphic, really liked the makeover, and decided to recreate it in Tableau.

Here’s the Wall Street Journal graphic.

Figure 1 -- Source of inspiration for Junk Charts  and Emma Whyte. From a 2016 survey by LeanIn.org and McKinsey & Co.

Figure 1 — Source of inspiration for Junk Charts  and Emma Whyte. From a 2016 survey by LeanIn.org and McKinsey & Co.

There are two important things the data is trying to tell us:

  1. The percentage of women decreases, a lot, the higher up you go in the corporate hierarchy; and,
  2. There are far more entry-level positions than there are managers than there are VPs, etc.

The chart does a good job on the first point but only uses text to covey the second point.

Contrast this with Emmy Whyte’s visualization:

Figure 2 -- Emma Whyte's makeover.

Figure 2 — Emma Whyte’s makeover.

Whoa.

I immediately “grokked” this.  There are way more men than women among VPs, Senior VPs, and in the C-Suite, but look how much narrower those bars are!  True, I cannot easily compare how much wider the Entry Level column is than the VP column, but is that really important?

Is the Marimekko in fact the “right” way to show this?

Being a little bit stubborn I was not ready to declare a Marimekko victory so I decided to see if I could build something that worked as well, if not better, using more common chart types.

Anything You Can Do, I Can Do…

I won’t go through all ten iterations I came up with but I will show some of my attempts to convey the data accurately and with the visceral wallop I get from Emma’s makeover.

100% Stacked Bar with Marginal Histogram

Putting a histogram in the margin has become a “go to” technique when I’m dealing with highlight tables and scatterplots so I thought that might work in this situation. Here’s a 100% stacked bar chart combined with a histogram.

Figure 3 -- 100% stacked bar with marginal histogram. 

Figure 3 — 100% stacked bar with marginal histogram.

I was so convinced this would just smoke the Marimekko. I mean just look how easy it is to make accurate comparisons!

That may be true, but I think the Marimekko in question does a better job.

Connected Dot Plot

Here’s another attempt using a connected dot plot.

Figure 4 -- Connected dot plot where the size of the circles reflects the percentage of the workforce.

Figure 4 — Connected dot plot where the size of the circles reflects the percentage of the workforce.

Here the lines separating the circles show the gender gap and the size of the circles reflects the percentage of the workforce.

OK, I think the gap is well represented but the spacing between job levels is a fixed width.  In my pursuit of accuracy I needed to find a way spread the circles based on percentage of the workforce.

Diverging Lines with Bands

Figure 5 shows two diverging lines with circles and bands that are proportionate to the percentage of the workforce (Entry level is 52 units wide, Manager is 28 units wide, and so on).

Figure 5 -- Diverging lines with dots and correctly-sized circles and bands

Figure 5 — Diverging lines with dots and correctly-sized circles and bands

But why are the lines sloping?  Shouldn’t the lines be flat for each job level?

Flat Lines

Here’s a similar approach but where the lines stay flat for each job level.

Figure 6 -- Flat lines and accurate circles and bands.

Figure 6 — Flat lines and accurate circles and bands.

More Approaches and the Graphic from the Actual Report

All told I made ten attempts.  The calculation I came up with for Figure 5 also made it possible to create a Markimekko just using a simple table calculation.

Note: I asked Jonathan Drummey to have a look at the Marimekko-with-table-calc approach and he points out that in both my example and Emma Whyte’s example the data isn’t “dense” so you can break the visualization simply by right-clicking a mark and selecting Exclude. That said, the technique is fine for static images and dashboards where you disable the Exclude functionality.

I also reviewed the full Women in the Workplace report and saw they used an interesting pipeline chart to relate the data.

Figure 7 -- "Pipeline" chart from Women in Workplace report (LeanIn.Org and McKinsey & Co.)

Figure 7 — “Pipeline” chart from Women in Workplace report (LeanIn.Org and McKinsey & Co.)

I applaud the creativity but have a lot of problems with the inaccurate proportions. Notice that this chart also has a sloping line suggesting a continuous decrease as you go from one level to another.

And The Winner is…

For me, Emma Whyte’s Marimekko does the best job of showing the data in a compelling and accurate format and I thank Emma for presenting such a worthwhile example.

Will I use this chart type in my practice?

It depends.

If the situation calls for it, I would try it along with other approaches and see what works best for the intended audience.

Here’s a link to the Tableau workbook that contains a copy of Emma Whyte’s original approach and many of my attempts to improve upon it. If you come up with an alternative approach that you think works well, please let me know.

Postscript

Big Book of Dashboards co-author Jeff Shaffer encouraged me to make some more attempts. Here’s a work in progress using jittering.

Jitter with bands

I think this looks promising.

Feb 222017
 

February 22, 2017

Overview

Earlier this week Gartner, Inc. published its “Magic Quadrant” report on Business Intelligence and Analytics (congratulations to Tableau for being cited as a leader for the fifth year in a row).

Coincidentally, this report came on the heels of one of my clients needing to create a scatterplot where there were four equally-sized quadrants even though the data did not lend itself to sitting in four equally-sized quadrants.

In this blog post we’ll look at the differences between a regular scatterplot  and a balanced quadrant scatterplot, and show how to create a self-adjusting balanced quadrant scatterplot  in Tableau using level-of-detail calculations and hidden reference lines.

The Gartner Magic Quadrant

Let’s start by looking at an example of a balanced quadrant chart.

Here’s the 2017 Gartner Magic Quadrant chart for Business Intelligence and Analytics.

Figure 1 -- 2017 Gartner Magic Quadrant for Business Intelligence and Analytics

Figure 1 — 2017 Gartner Magic Quadrant for Business Intelligence and Analytics

Notice that there aren’t measure numbers along the x-axis and y-axis so we don’t know what the values are for each dot.  Indeed, we don’t know how high and low your “Vision” and “Ability to Execute” scores need to be to fit into one of the four quadrants. We just know that anything above the horizontal line means a higher “Ability to Execute” and anything to the right of the vertical line means a higher “Completeness of Vision.”  That is, we see how the dots are positioned with respect to each other versus how far from 0 they are. Indeed, you could argue that the origin (0, 0) could be the dead center of the graph as opposed to the bottom left corner.

This balanced quadrant is attractive and easy to understand. Unfortunately, such a well-balanced scatterplot rarely occurs naturally as you will rarely have data that is equally distributed with respect to a KPI reference line.

A Typical Scatterplot with Quadrants

Consider Figure 2 below where we compare the sum of Sales on the x-axis with the sum of Quantity on the Y-Axis. Each dot represents a different customer.

Figure 2 -- Scatterplot comparing sales with quantity where each dot represents a customer.

Figure 2 — Scatterplot comparing sales with quantity where each dot represents a customer.

Now let’s see what happens if we add Average reference lines and color the dots relative to these reference lines.

 

Figure 3 -- Scatterplot with Average reference lines.

Figure 3 — Scatterplot with Average reference lines.

I think this looks just fine as it’s useful to see just how scattered the upper right quadrant is and just how tightly clustered the bottom left quadrant is. That said, if the values become more skewed it will become harder to see how the values fall into four separate quadrants and this is where balancing the quadrants can become very useful.

Note: The quadrant doesn’t have to be based on Average. You can use Median or any calculated KPI.

“Eyeballing” what the axes should be

We’ll get to calculating the balanced axes values in a moment but for now let’s just “eyeball” the visualization and hard code minimum values for the x and y axes.

Let’s first deal with the x-axis.  The maximum value looks to be around $3,000 and the average is at around $500 so the difference between the average line and maximum is around $2,500.

We need the difference between the average line and minimum value to also be $2,500 so we need to change the x-axis so that it starts at -$2,000 instead of 0.

Applying the same approach to the y-axis we see that the maximum value is around 34 and the average is around 11 yielding a difference of 23 (34 -11).  We need the y-axis to start at 23 units less than the average which would be -12 (11 – 23).

Here’s what the chart looks like with these hard-coded axes.

Figure 4 -- Balanced quadrants using hard-coded axes values.

Figure 4 — Balanced quadrants using hard-coded axes values.

If we ditch the zero lines we’ll get a pretty good taste of what the final version will look like.

Figure 5 -- Balanced quadrants with zero lines removed.

Figure 5 — Balanced quadrants with zero lines removed.

So, this works… in this one case. But what happens if we apply different filters?

We need to come up with a way to dynamically adjust the axes and we can in fact do this by adding hidden reference lines that are driven by level-of-detail calculations.

Adding Reference Lines

We need to come up with a way to calculate what the floor value should be for the x-axis and the y-axis.  The pseudocode for this is:

Figure out what the maximum value is and subtract the average line value, then, starting from the average line, subtract the difference we just computed.

Applying a little math, we end up with this:

-(Max Value) + (2*Average Value)

Let’s see if that passes the “smell” test for the y-axis.

-34 + (2*11) = -12

Now we need to translate this into a Tableau calculation.  Here’s the calculation to figure out the y-axis reference line.

Figure 6 -- Formula for determining the y-axis reference line.

Figure 6 — Formula for determining the y-axis reference line.

And here’s the same thing for the x-axis:

Figure 7 -- Formula for determining the x-axis reference line

Figure 7 — Formula for determining the x-axis reference line.

Now we need to add both calculations onto Detail and then add reference lines as shown below.

Figure 8 -- Adding the x-axis reference line. Notice that the line is currently visible. Further note that we could be using Max or Min instead of average as the value will stay the same no matter what.

Figure 8 — Adding the x-axis reference line. Notice that the line is currently visible. Further note that we could be using Max or Min instead of average as the value will stay the same no matter what.

Here’s what the resulting chart looks like with the zero lines and reference lines showing.

Auto adjusting with reference lines

Figure 9 — Auto-adjusting balanced quadrant chart with visible reference lines and zero lines. The reference lines force the “floor” value Tableau uses to determine where the axes should start.

Hiding the lines, ditching the tick marks, and changing the axes labels

Now all we need to do is attend to some cosmetics; specifically, we need to format the reference lines so there are no visible lines and no labels, as shown in Figure 10.

Figure 10 -- Hiding lines and labels

Figure 10 — Hiding lines and labels

Then we need to edit the axes labels and hide the tick marks as shown in Figure 11.

Figure 11 -- Editing the axes labels and removing tick marks.

Figure 11 — Editing the axes labels and removing tick marks.

This will yield the auto-adjusting, balanced quadrant chart we see in Figure 12.

Figure 12 -- The completed, auto-adjusting balanced quadrant chart.

Figure 12 — The completed, auto-adjusting balanced quadrant chart.

Other Considerations

What happens if instead of the values spreading out in the upper right we get values that spread out in the bottom left?  In this case we would need to create a second set of hidden reference lines that force Tableau to draw axes that extend further up and to the right.

Also note that since we are using FIXED in our level-of-detail calculations we need to make sure any filters have been added to context so Tableau processes these first before performing the level-of-detail calculations.

Could I have used a table calculation instead of an LoD calc? I first tried a table calculation and ran into troubles with trying to specify an average for one aspect of the calculation and a maximum for another aspect using the reference line dialog box. I may have given up too early but got tired of fighting to make it work.

Note: Jonathan Drummey points out that we can in fact use INCLUDE instead of FIXED here so we would not have to use context filters. If you go this route make sure to edit the feeder calcs for the KPI Dots field ([Quantity  — Windows Average LoD] and [Sales  — Windows Average LoD]) so these use INCLUDE as well.

Give it a try

Here’s a dashboard that allows you to compare a traditional scatterplot with reference lines with the auto-adjusting balanced quadrant chart.  Feel free to download and explore.

 Posted by on February 22, 2017 1) General Discussions, Blog Tagged with: , , , ,  5 Responses »
Feb 152017
 

Overview

Prior to working the last two years with Jeffrey Shaffer and Andy Cotgreave on the upcoming The Big Book of Dashboards I tended to look at BANs — large, occasionally overstuffed Key Performance Indicators (KPIs) —  as ornamental rather than informational.  I thought they just took up space on a dashboard without adding much analysis.

I’ve changed my mind and now often recommend their use to my clients.

In this blog post we’ll see what BANs are and why they can be so useful.

Examples of BANs

Here are several examples of dashboards featured in The Big Book of Dashboards, all of which use BANs.

Figure 1 -- Complaints dashboard by Jeffrey Shaffer

Figure 1 — Complaints dashboard by Jeffrey Shaffer

Figure 2 -- Agency Utilization dashboard by Vanessa Edwards

Figure 2 — Agency Utilization dashboard by Vanessa Edwards

Figure 3 -- Telecom Operator Executive dashboard by Mark Wevers / Dundas BI

Figure 3 — Telecom Operator Executive dashboard by Mark Wevers / Dundas BI

Why these BANs work

The BANs in these three dashboards are useful in that they provide key takeaways, context, and clarification. Let’s see how they do these things.

Key takeaways

If you had to summarize the first dashboard in a just a few words, how would you do that? The BANs shown in Figure 4 get right to the point.

Figure 4 -- Concise complaints summary (we'll discuss the colors in a moment).

Figure 4 — Concise complaints summary (we’ll discuss the colors in a moment).

The same can be said of the BANs in the Agency Utilization dashboards. By looking at the first two BANs in Figure 5, we can see that the agency made $3.8 million but could make $3.4 million more if it were to meet its billable goals.  That is the most important takeaway and it’s presented in big, bold numbers right at the top of the dashboard.

Figure 5 -- If we had to distill this entire dashboard down to one key point, it would be that the current sales are $3.8M but they could be $3.4M more.

Figure 5 — If we had to distill this entire dashboard down to one key point, it would be that the current sales are $3.8M but they could be $3.4M more.

Context

The Three BANs in the Telecom Operator Executive dashboard (Figure 3) not only provide key takeaways but also provide context for the charts that appear to the right of each BAN.  Consider the strip shown in Figure 6 which starts with the proclamation that ARPU (Average Revenue Per User) is $68.

Figure 6 -- What contributes to ARPU being $68, and how does prepaid ARPU compare to postpaid?

Figure 6 — What contributes to Postpaid ARPU being $68?

The images to the right explain everything that goes into making the $68 (Comparison of Postpaid to Prepaid, Voice, Data, Addons breakdown, etc.)

Note that the dashboard designer packs a lot of very useful information into the box that surrounds the BAN; specifically, ARPU is up $6 YTD, but is down in Q4 compared to Q3 (that’s how to interpret the line atop the shaded bars).

Clarification

The BANs in Figures 1 and 3 aren’t just conversation starters /  key takeaways, they are also color legends that clarify the color coding throughout the dashboard.

Consider the Complaints dashboard; the BANs indicate that Closed is teal and Open is red. Armed with this knowledge we know exactly what to make of the chart in Figure 7.

Figure 7 -- Everything prior to November 2016 is closed and only a handful of things in November are open.

Figure 7 — Everything prior to November 2016 is closed and only a handful of things in November are open.

The same goes for the Agency Utilization dashboard. The BANs inform us that blue represents Fees and green represents Potential so I know exactly how to interpret bars that are those colors when I look at a chart like the one shown in Figure 8.

Figure 8 -- Because the BANs told me how to interpret color, we can see that for Technology the company billed $883K but could bill an additional $1,762K if it were to hit its targets.

Figure 8 — Because the BANs told me how to interpret color, we can see that for Technology the company billed $883K but could bill an additional $1,762K if it were to hit its targets.

Conclusion

BANs can do a lot to help people understand key components of your dashboard: they can be conversation starters (and finishers), provide context to adjacent charts, and serve as a universal color legend.

Note: While I’ve tried to show how effective BANs can be I did not address how a particular font can help / hurt your BAN initiative. 

The Big Book of Dashboards co-author Jeff Shaffer has been studying font use and has a fascinating take on the new fonts Tableau added to their product this past year. You can read about it here.

 Posted by on February 15, 2017 1) General Discussions, Blog Tagged with: , , , ,  2 Responses »
Feb 102017
 

February 10, 2017

I was greatly saddened earlier this week when I read that Hans Rosling had died.

Rosling was a Swedish professor of Global Health who, using data, stunning visualizations, and incredible charm, changed the way people understand the world.

I first became aware of his work six years ago when a friend showed me Rosling’s TED Talk from 2006.  If you’ve not seen it I encourage you to watch it now.

I’ve seen this video hundreds of times and it never fails to make me want to be better at what I do. It’s also a master class in how to give a truly great presentation.  I defy you not to be completely won over when he compares the knowledge chimpanzees have of the world with the knowledge the committee that awards the Nobel Prize in medicine has of the world.

Rosling was a true pioneer in using data — and in particular, visualizing data — to help correct peoples’ misperceptions about the world.  He didn’t set out to be a visualization visionary; he just realized that he needed to create new techniques so people would be able to better see and understand the world.  As he states in another of his videos, “having the data is not enough.  I have to show it in ways people both enjoy and understand.”

I had the pleasure of meeting Rosling at the 2014 Tableau Conference where Rosling was a keynote speaker.  I was invited to a special breakfast and was fortunate to be able to sit next to him.

He could not have been a more engaging and delightful dining companion. I told him that my daughter was majoring in Global Health and asked if I could take a picture with him. He gladly consented and suggested we pretend to be engaged in lively banter.

Me with Hans Rosling in 2014.

Me with Hans Rosling in 2014.

One thing I want to underscore about Rosling was his unbridled optimism for humankind. If somebody ignorantly claimed how bad things are compared with 50 years ago, he would counter with facts to show just how much better things are now. With fervor, he would cite amazing progress in eradicating malaria, educating young girls, lifting people out of poverty and decreasing the number of children born while increasing the average lifespan of people living in poor countries.

And he was steadfastly convinced that if we can fight ignorance and implement policies based on facts, the world will be a much better place in 50 years than it is now.

Let’s do what we can to prove him right.

Steve Wexler
February 10, 2017

Learn about Gapminder, an organization that Rosling co-founded with his wife and son.

Watch more of Rosling’s TED Talks.

Note: At the Tableau breakfast, when we went to sit down I offered my “prime” seat to Jock Mackinlay, Tableau’s VP of Research and Design.  Jock told me he had sat next to Rosling at the London Conference earlier that year and I should keep my seat.

Thank you, Jock.