April 25, 2017

## Overview

I became a big fan of adding a marginal histogram to scatterplots when I first saw them applied in Tableau visualizations from Shine Pulikathara and Ben Jones.

For those not familiar with how these work, consider the scatterplot shown in Figure 1 that shows the relationship between salary and age.

Figure 1 — Comparing Age and Salary on a scatterplot

Some interesting things here; for example,  we can see that salaries appear to be highest between ages 50 and 55 and lowest among the youngest and older workers.

But look what happens when we add marginal histograms to the x and y axes (Figure 2.)

Figure 2 — Scatterplot with marginal histogram

Whoa! The two bar charts to the right and below the main chart add a lot of insight into the data.  We don’t just see the correlations, but now we can also see age demographics and salary distribution in the organization.

## Marginal Histograms and Jitterplots

The marginal histogram works with other visualizations as well. Consider the dot plot with jitter (jitterplot) example from Lean management tool innovator LeanKit in Figure 3.

Figure 3 — Individual and aggregate vies of important data from LeanKit

The combination of the individual data points (the jittered dots that represent Kanban cards) and the aggregated data (stacked bar charts) tells a more complete story than having only the aggregation or only the individual dots.

## Marginal Histograms and Highlight Tables

Readers of this blog know I like highlight tables and often use them as a “visualization gateway drug” to move people from cross tabs to more insightful ways of looking at their data.

But as great as they are, they do not lend themselves to accurate comparisons of the data. Consider Figure 4 where we see the percentage of sales broken down by region.

Figure 4 — Sorted highlight table showing percentage of sales by sub-category and region

Yes, I can see that Phones in the East is a lot darker than Copiers in the West, but without the numbers there’s no way to could do an exact comparison as I don’t know of anyone that can look at just the color coding and exclaim “ah, that cell is twice as blue as that other cell.”

But look what happens when we add the marginal histogram to the visualization, as shown in Figure 5.

Figure 5 — Sorted highlight table with marginal histograms. Here we see percentage of sales.

I’ll confess that the histograms don’t work quite as well if you have negative values. Here’s what it looks like if we look at percentage of profit broken down by sub-category and region.

Figure 6 — Sorted highlight table with marginal histograms. Here we see percentage of profit.

Because we have bars pointing in different directions for the histogram on the right the look isn’t quite as clean, but it certainly works.

## See for Yourself

I’ve included an embedded dashboard below where you can experiment with different metrics and different sorting choices. Feel free to download and “look under the hood.”

Note that making this type of dashboard is not very difficult; the only tricky part is getting the three elements to align properly. Ben Jones gets into those particulars in his blog post.

Tagged with: , ,

More thoughts on the Markimekko chart and in particular how to build one in Tableau.

April 4, 2017

## Overview

Given my reluctance to embrace odd chart types and my conviction that I would find something better I was surprised to find myself last month writing about — and endorsing — the Marimekko chart.

If I was surprised then I’m absolutely gobsmacked to be writing about it again.

What precipitated all this was another very good example of the chart in the wild. After admiring it I couldn’t help but “look under the hood” (hey, we are talking about Tableau Public and people sharing this stuff freely) and I thought that the dashboard designer was working harder than he needed to build the visualization.

So, if people are going to use these things I thought I would share an alternative, and I think easier, technique for building them.

## The Great Example from Neil Richards

Here’s the terrific Makeover Monday dashboard from Neil Richards where we see the likelihood of certain jobs being replaced by automation.

Neil does a great job highlighting some of the more interesting findings, but if you want to know more than what Neil highlights you’ll need to explore the dashboard on your own.

Notice that in both this case and in Emma Whyte’s we are dealing with only two data segments; e.g., male vs. female and at-risk vs. not at-risk jobs. Having only two colors is one of the main reasons why the chart works well.

Okay! Uncle! I agree that under the right conditions this is a useful chart and I can see what you may want to make one.

But is there an easier way to make one?

## An Easier Way to Create a Markimekko Chart in Tableau

It turns out the same technique Joe Mako showed me six years ago for building a divergent stacked bar chart works great for fashioning a Markimekko.  Let’s see how to do this using Superstore data with fields similar to what was available in both Emma and Neil’s dashboards.

Let’s say I want to compare the magnitude of sales with the profitability of items by region.  Figure 2 shows the overall magnitude of sales but makes comparing profitability difficult.

Figure 2 — Overall sales is easy to see but comparing profitability across regions is difficult.

Here’s another attempt using a 100% stacked bar chart.

Figure 3 — Showing profitability with a 100% stacked bar chart.

Yes, this does a much better job allowing us to compare the profitability of each region, but there’s no way to easily glean that Sales in the West is almost double sales in the South (which is easy to do in Figure 2.)

So, how can we make the regions that have large sales be wide and the regions that have small sales be  narrow?

## Understanding the Fields

Before going much further let’s make sure we understand the following three fields:

• Percentage Profitable Sales
• Percentage Unprofitable Sales
• Sales Percentage of
##### [Percentage Profitable Sales]

This is defined as

`SUM(IF [Profit]>=0 THEN [Sales] END)/SUM(Sales)`

… and translates as “if the profit for an item within a partition is profitable, add it up, then divide by the total sales within the partition.”

This is the field that gives us the 90%, 77%, 76%, and 72% results shown in Figure 3.

##### [Percentage Unprofitable Sales]

This is defined as

`1 - [Percentage of Profitable Sales]`

… and gives us the 10%, 23%, 24%, ad 28% shown in Figure 3.

##### [Sales Percentage of]

This is defined as

`SUM([Sales]) /TOTAL(SUM([Sales]))`

… and we will use it to compute the percentage of sales across the four regions (i.e., show me the sales for one region divided by the sales for all the regions). Here’s how we might use it in a visualization.

Figure 4 — Using the calculation to figure out how wide each region should be.

So, in Figure 4 we can see that the West segment is a lot thicker than the South segment.

How can we apply this additional depth to what we had in Figure 3?

## Make it Easy to See if the Math is Correct

At this point it will be helpful to see the interplay of the various measures and dimensions using a cross tab like the one shown in Figure 5.

Figure 5 — Cross tab showing the relationship among the different measures and dimensions.

The first four columns are easy to interpret:

“I see that sales in the West is \$725,458 of which 10% is unprofitable and 90% is profitable.  That \$725,458 represents 31.6% of the total sales.”

But how is the field called [Start at] defined and how are we going to use it?

### Understanding [Start at]

[Start at] is defined as

`PREVIOUS_VALUE(0)+ZN(LOOKUP([Sales Percentage of],-1))`

This is the calculation that figures out where the mark should start while [Sales Percentage of] will later determine how thick the mark should be.  Let’s see how this all works together.

Figure 6 — How [Start at] and [Sales Percentage of] will work together.  Note that “Compute Using” for the two table calculations is set to [Region].

For the West region we want to start at 0% and have a bar that is 31.6% units side. The function

`PREVIOUS_VALUE(0)`

Tells Tableau to look at whatever is the value for [Sales at] for the row above and if there is no row above make the value 0 (see Item 1 in Figure 6, above.)

Add to this the value for [Sales Percentage of] in the previous row (Item 2 which is also not present) and you get 0 + 0 (Item 3).

For the East region we want to start wherever West left off (Item 3 plus Item 4, which gives us item 5) and make the mark 29.5% wide (item 6).

For the Central region we want to start wherever the previous region left off (Item 5 plus item 6, which gives us item 7) and make the mark 21.8% wide (Item 8).

Let’s see how this all fits together into the Marimekko visualization in Figure 7.

Figure 7 — Using [Start at ] and [Sales Percentage of] to make the Marimekko work.

There are three things to keep in mind.

1. [Start at] is on columns and determines the starting point (how far to the right) for each of the regions.
2. [Sales Percentage of] is on Size and determines how thick the bars should be.
3. Size is set to Fixed width, left aligned, where Fixed means the measure on the Size shelf is determining the thickness.

Figure 8 — Size must be fixed and left-aligned.

## Some Interesting Findings

I built a parameter-driven version of the Marimekko (embedded at the end of this blog post) that allows the viewer to select different dimensions and different ways to sort. Here’s what happens when we look at Sub-Category sorted by Profitability.

Figure 9 — Profitability by Sub-Category.

Okay, not a big surprise here given how many visualizations we’ve all seen showing that Tables are problematic.

That said, I was in for a surprise when I broke this down by state and sorted by the magnitude of sales, as shown below.

Figure 10 — Profitability by state, sorted by Sales.

Wow, after 11 years of living with this data set I never realized that 60% of the items sold in Texas were unprofitable.  Who knew?

To be honest I’m not convinced we need a Marimekko to see this clearly.  A simple sorted bar chart will do the trick, as shown in Figure 11.

Figure 11 — Sorted bar chart.

Indeed, I think this very simple view is better than the Marimekko in many respects.

I guess it depends what you’re trying to get across.

## See for Yourself

I’ve included an embedded workbook that has the Superstore example as well as versions of the visualizations Emma Whyte and Neil Richards built, but using this alternative technique.

I encourage you to think long and hard before deploying a Marimekko.  But if you do decide to build one I hope the techniques I explored here will prove useful.

Tagged with: , , ,

Or

How I stopped worrying and learned to love appreciate the Marimekko

March 19, 2017

## Overview

Readers of my blog know that I suffer from what Maarten Lambrechts calls xenographphobia, the fear of unusual graphics.  I’ll encounter a chart type that I’ve not seen before, purse my lips, and think (smugly) that there is undoubtedly a better way to show the data than in this novel and, to me, unusual chart.

That was certainly my reaction to “Marimekko Mania” when Tableau 10.0 was first released. I didn’t see a solid use case for this chart. There were some wonderful blog posts from Jonathan Drummey and Bridget Cogley on the subject, but I just wasn’t buying the need for the chart type.

Note: It turns that for many situations you can make a perfectly fine Marimekko just using table calculations. I’ll weigh in on this later.

## Enter Emma Whyte and Workout Wednesday

My “I’ll never need to use that” arrogance was disrupted a few weeks ago when I read this blog post from Emma Whyte.  The backstory is that Emma reviewed a Junk Charts makeover of a Wall Street Journal graphic, really liked the makeover, and decided to recreate it in Tableau.

Here’s the Wall Street Journal graphic.

Figure 1 — Source of inspiration for Junk Charts  and Emma Whyte. From a 2016 survey by LeanIn.org and McKinsey & Co.

There are two important things the data is trying to tell us:

1. The percentage of women decreases, a lot, the higher up you go in the corporate hierarchy; and,
2. There are far more entry-level positions than there are managers than there are VPs, etc.

The chart does a good job on the first point but only uses text to covey the second point.

Contrast this with Emmy Whyte’s visualization:

Figure 2 — Emma Whyte’s makeover.

Whoa.

I immediately “grokked” this.  There are way more men than women among VPs, Senior VPs, and in the C-Suite, but look how much narrower those bars are!  True, I cannot easily compare how much wider the Entry Level column is than the VP column, but is that really important?

Is the Marimekko in fact the “right” way to show this?

Being a little bit stubborn I was not ready to declare a Marimekko victory so I decided to see if I could build something that worked as well, if not better, using more common chart types.

## Anything You Can Do, I Can Do…

I won’t go through all ten iterations I came up with but I will show some of my attempts to convey the data accurately and with the visceral wallop I get from Emma’s makeover.

##### 100% Stacked Bar with Marginal Histogram

Putting a histogram in the margin has become a “go to” technique when I’m dealing with highlight tables and scatterplots so I thought that might work in this situation. Here’s a 100% stacked bar chart combined with a histogram.

Figure 3 — 100% stacked bar with marginal histogram.

I was so convinced this would just smoke the Marimekko. I mean just look how easy it is to make accurate comparisons!

That may be true, but I think the Marimekko in question does a better job.

##### Connected Dot Plot

Here’s another attempt using a connected dot plot.

Figure 4 — Connected dot plot where the size of the circles reflects the percentage of the workforce.

Here the lines separating the circles show the gender gap and the size of the circles reflects the percentage of the workforce.

OK, I think the gap is well represented but the spacing between job levels is a fixed width.  In my pursuit of accuracy I needed to find a way spread the circles based on percentage of the workforce.

##### Diverging Lines with Bands

Figure 5 shows two diverging lines with circles and bands that are proportionate to the percentage of the workforce (Entry level is 52 units wide, Manager is 28 units wide, and so on).

Figure 5 — Diverging lines with dots and correctly-sized circles and bands

But why are the lines sloping?  Shouldn’t the lines be flat for each job level?

##### Flat Lines

Here’s a similar approach but where the lines stay flat for each job level.

Figure 6 — Flat lines and accurate circles and bands.

## More Approaches and the Graphic from the Actual Report

All told I made ten attempts.  The calculation I came up with for Figure 5 also made it possible to create a Markimekko just using a simple table calculation.

Note: I asked Jonathan Drummey to have a look at the Marimekko-with-table-calc approach and he points out that in both my example and Emma Whyte’s example the data isn’t “dense” so you can break the visualization simply by right-clicking a mark and selecting Exclude. That said, the technique is fine for static images and dashboards where you disable the Exclude functionality.

I also reviewed the full Women in the Workplace report and saw they used an interesting pipeline chart to relate the data.

Figure 7 — “Pipeline” chart from Women in Workplace report (LeanIn.Org and McKinsey & Co.)

I applaud the creativity but have a lot of problems with the inaccurate proportions. Notice that this chart also has a sloping line suggesting a continuous decrease as you go from one level to another.

## And The Winner is…

For me, Emma Whyte’s Marimekko does the best job of showing the data in a compelling and accurate format and I thank Emma for presenting such a worthwhile example.

Will I use this chart type in my practice?

It depends.

If the situation calls for it, I would try it along with other approaches and see what works best for the intended audience.

Here’s a link to the Tableau workbook that contains a copy of Emma Whyte’s original approach and many of my attempts to improve upon it. If you come up with an alternative approach that you think works well, please let me know.

## Postscript

Big Book of Dashboards co-author Jeff Shaffer encouraged me to make some more attempts. Here’s a work in progress using jittering.

I think this looks promising.

Tagged with: , , , ,

And… what did they choose?

March 8, 2017

## Overview

I’ve discussed how to visualize check-all-that-apply questions in Tableau. Assuming your survey is coded as Yes = 1 and No = 0, you can fashion a sorted bar chart like this the one shown in Figure 1 using the following calculation.

SUM([Value]) / SUM(Number of Records)

The field [Value] would be 0 or 1 for each respondent that answered the question.

Figure 1 — Visualizing a check-all-that-apply question

I’ve also discussed how we can see break this down by various demographics (Gender, Location, Generation, etc.)

What I’ve not blogged about (until now) is how to answer the following questions:

• How many people selected one item?
• Two items?
• Five items?
• Of the people that only selected one item, what did they select?
• Of the people that selected four items, what did they select?

Prior to the advent of LoD calculations this was doable, but a pretty big pain in the ass.

Fortunately, using examples that are “out in the wild” we can cobble together a compelling way to show the answers to these questions.

## Visualizing How Many People Selected 1, 2, 3, N Items?

One of the best blog posts on Level-of-Detail expressions is Bethany Lyons’ Top 15 LoD Expressions.

It turns out the very first example discusses how figure out how many customers placed one order, how many placed two orders, etc.  This will give us exactly what we need to figure out how many people selected 1, 2, 3, N items in a check-all-that-apply question.

Here’s the calculation that will do the job.

Figure 2 — The LoD calculation we’ll need.

This translates as “for the questions you are focusing on (and you better have your context filters happening so you are only looking at just the check-all-that-apply stuff), for each Resp ID, add up the values for all the questions people answered.”

Remember, the responses are 0s and 1s, so if somebody selected six things the SUM([Value]) would equal 6.

So, how do we use this?

The beautiful thing about using FIXED as our LoD keyword is that it allows us to turn the results into a dimension.  This means we can put How Many Selected on columns and CNTD(Resp ID) on rows and get a really useful histogram like the one shown in Figure 3.

Figure 3 — Histogram showing number of respondents that selected 0 items, 1 item, 2 item, etc.

Notice the filter settings indicating that we only want responses to the check-all-that-apply questions. Further note that this filter has been added to the context which means we want Tableau to filter the results before computing the FIXED LoD calculation.

## So, what did these People Select?

Okay, now we know how many people selected one item, two items, etc.

Just what did they select?

Because we set [How Many Selected] using the FIXED keyword we can use it like any other dimension.  That is, it will behave just like [Gender], [Location], and so on.

Borrowing from an existing technique (the visual ranking by category that I cited earlier) we can fashion a very useful dashboard that allows us to see some interesting nuances in the data.  For example, while Metabolism is ranked second overall with 70% of people selecting it, it ranked seventh among those that only selected one item (with only 4%), while 84% of people that selected four items selected it (Figure 4.)

Figure 4 — Metabolism is ranked second overall with 70%, but only 4% of folks that chose one item selected it.

Similarly, check out the breakdown for Blood Pressure which is ranked third with 60% overall but is ranked first among folks that only measure one thing (Figure 5.)

Figure 5 — Metabolism is ranked third overall with 60%, but was ranked first among those that only selected on item.

## Other Useful Features of the Dashboard

##### The Marginal Histogram

The marginal histogram along the bottom of the chart shows you the breakdown of responses.

Figure 6 — Marginal histogram shows distribution of responses

##### Tool Tips Help Interpret the Findings

The ordinal numbers can be confusing as sometimes the number 2 means the number of items selected and other times it is the ranking.  Hovering over a bar explains how to interpret the results.

##### Swap Among Different Dimensions

While this is first and foremost a blog post about showing how many people selected a certain number of items (and what they selected) it was very easy to add a parameter that allows you to swap among different dimensions.  In Figure 8 we see the break down by Location.

Figure 8 — Use the Break Down by parameter to see rank and magnitude for the selected item among different dimensions, in this case Location.

Tagged with: , , , ,

February 22, 2017

## Overview

Earlier this week Gartner, Inc. published its “Magic Quadrant” report on Business Intelligence and Analytics (congratulations to Tableau for being cited as a leader for the fifth year in a row).

Coincidentally, this report came on the heels of one of my clients needing to create a scatterplot where there were four equally-sized quadrants even though the data did not lend itself to sitting in four equally-sized quadrants.

In this blog post we’ll look at the differences between a regular scatterplot  and a balanced quadrant scatterplot, and show how to create a self-adjusting balanced quadrant scatterplot  in Tableau using level-of-detail calculations and hidden reference lines.

Let’s start by looking at an example of a balanced quadrant chart.

Here’s the 2017 Gartner Magic Quadrant chart for Business Intelligence and Analytics.

Figure 1 — 2017 Gartner Magic Quadrant for Business Intelligence and Analytics

Notice that there aren’t measure numbers along the x-axis and y-axis so we don’t know what the values are for each dot.  Indeed, we don’t know how high and low your “Vision” and “Ability to Execute” scores need to be to fit into one of the four quadrants. We just know that anything above the horizontal line means a higher “Ability to Execute” and anything to the right of the vertical line means a higher “Completeness of Vision.”  That is, we see how the dots are positioned with respect to each other versus how far from 0 they are. Indeed, you could argue that the origin (0, 0) could be the dead center of the graph as opposed to the bottom left corner.

This balanced quadrant is attractive and easy to understand. Unfortunately, such a well-balanced scatterplot rarely occurs naturally as you will rarely have data that is equally distributed with respect to a KPI reference line.

## A Typical Scatterplot with Quadrants

Consider Figure 2 below where we compare the sum of Sales on the x-axis with the sum of Quantity on the Y-Axis. Each dot represents a different customer.

Figure 2 — Scatterplot comparing sales with quantity where each dot represents a customer.

Now let’s see what happens if we add Average reference lines and color the dots relative to these reference lines.

Figure 3 — Scatterplot with Average reference lines.

I think this looks just fine as it’s useful to see just how scattered the upper right quadrant is and just how tightly clustered the bottom left quadrant is. That said, if the values become more skewed it will become harder to see how the values fall into four separate quadrants and this is where balancing the quadrants can become very useful.

Note: The quadrant doesn’t have to be based on Average. You can use Median or any calculated KPI.

## “Eyeballing” what the axes should be

We’ll get to calculating the balanced axes values in a moment but for now let’s just “eyeball” the visualization and hard code minimum values for the x and y axes.

Let’s first deal with the x-axis.  The maximum value looks to be around \$3,000 and the average is at around \$500 so the difference between the average line and maximum is around \$2,500.

We need the difference between the average line and minimum value to also be \$2,500 so we need to change the x-axis so that it starts at -\$2,000 instead of 0.

Applying the same approach to the y-axis we see that the maximum value is around 34 and the average is around 11 yielding a difference of 23 (34 -11).  We need the y-axis to start at 23 units less than the average which would be -12 (11 – 23).

Here’s what the chart looks like with these hard-coded axes.

Figure 4 — Balanced quadrants using hard-coded axes values.

If we ditch the zero lines we’ll get a pretty good taste of what the final version will look like.

Figure 5 — Balanced quadrants with zero lines removed.

So, this works… in this one case. But what happens if we apply different filters?

We need to come up with a way to dynamically adjust the axes and we can in fact do this by adding hidden reference lines that are driven by level-of-detail calculations.

We need to come up with a way to calculate what the floor value should be for the x-axis and the y-axis.  The pseudocode for this is:

Figure out what the maximum value is and subtract the average line value, then, starting from the average line, subtract the difference we just computed.

Applying a little math, we end up with this:

-(Max Value) + (2*Average Value)

Let’s see if that passes the “smell” test for the y-axis.

-34 + (2*11) = -12

Now we need to translate this into a Tableau calculation.  Here’s the calculation to figure out the y-axis reference line.

Figure 6 — Formula for determining the y-axis reference line.

And here’s the same thing for the x-axis:

Figure 7 — Formula for determining the x-axis reference line.

Now we need to add both calculations onto Detail and then add reference lines as shown below.

Figure 8 — Adding the x-axis reference line. Notice that the line is currently visible. Further note that we could be using Max or Min instead of average as the value will stay the same no matter what.

Here’s what the resulting chart looks like with the zero lines and reference lines showing.

Figure 9 — Auto-adjusting balanced quadrant chart with visible reference lines and zero lines. The reference lines force the “floor” value Tableau uses to determine where the axes should start.

## Hiding the lines, ditching the tick marks, and changing the axes labels

Now all we need to do is attend to some cosmetics; specifically, we need to format the reference lines so there are no visible lines and no labels, as shown in Figure 10.

Figure 10 — Hiding lines and labels

Then we need to edit the axes labels and hide the tick marks as shown in Figure 11.

Figure 11 — Editing the axes labels and removing tick marks.

This will yield the auto-adjusting, balanced quadrant chart we see in Figure 12.

## Other Considerations

What happens if instead of the values spreading out in the upper right we get values that spread out in the bottom left?  In this case we would need to create a second set of hidden reference lines that force Tableau to draw axes that extend further up and to the right.

Also note that since we are using FIXED in our level-of-detail calculations we need to make sure any filters have been added to context so Tableau processes these first before performing the level-of-detail calculations.

Could I have used a table calculation instead of an LoD calc? I first tried a table calculation and ran into troubles with trying to specify an average for one aspect of the calculation and a maximum for another aspect using the reference line dialog box. I may have given up too early but got tired of fighting to make it work.

Note: Jonathan Drummey points out that we can in fact use INCLUDE instead of FIXED here so we would not have to use context filters. If you go this route make sure to edit the feeder calcs for the KPI Dots field ([Quantity  — Windows Average LoD] and [Sales  — Windows Average LoD]) so these use INCLUDE as well.

## Give it a try

Tagged with: , , , ,

## Overview

Prior to working the last two years with Jeffrey Shaffer and Andy Cotgreave on the upcoming The Big Book of Dashboards I tended to look at BANs — large, occasionally overstuffed Key Performance Indicators (KPIs) —  as ornamental rather than informational.  I thought they just took up space on a dashboard without adding much analysis.

I’ve changed my mind and now often recommend their use to my clients.

In this blog post we’ll see what BANs are and why they can be so useful.

## Examples of BANs

Here are several examples of dashboards featured in The Big Book of Dashboards, all of which use BANs.

Figure 1 — Complaints dashboard by Jeffrey Shaffer

Figure 2 — Agency Utilization dashboard by Vanessa Edwards

Figure 3 — Telecom Operator Executive dashboard by Mark Wevers / Dundas BI

## Why these BANs work

The BANs in these three dashboards are useful in that they provide key takeaways, context, and clarification. Let’s see how they do these things.

##### Key takeaways

If you had to summarize the first dashboard in a just a few words, how would you do that? The BANs shown in Figure 4 get right to the point.

Figure 4 — Concise complaints summary (we’ll discuss the colors in a moment).

The same can be said of the BANs in the Agency Utilization dashboards. By looking at the first two BANs in Figure 5, we can see that the agency made \$3.8 million but could make \$3.4 million more if it were to meet its billable goals.  That is the most important takeaway and it’s presented in big, bold numbers right at the top of the dashboard.

Figure 5 — If we had to distill this entire dashboard down to one key point, it would be that the current sales are \$3.8M but they could be \$3.4M more.

##### Context

The Three BANs in the Telecom Operator Executive dashboard (Figure 3) not only provide key takeaways but also provide context for the charts that appear to the right of each BAN.  Consider the strip shown in Figure 6 which starts with the proclamation that ARPU (Average Revenue Per User) is \$68.

Figure 6 — What contributes to Postpaid ARPU being \$68?

The images to the right explain everything that goes into making the \$68 (Comparison of Postpaid to Prepaid, Voice, Data, Addons breakdown, etc.)

Note that the dashboard designer packs a lot of very useful information into the box that surrounds the BAN; specifically, ARPU is up \$6 YTD, but is down in Q4 compared to Q3 (that’s how to interpret the line atop the shaded bars).

##### Clarification

The BANs in Figures 1 and 3 aren’t just conversation starters /  key takeaways, they are also color legends that clarify the color coding throughout the dashboard.

Consider the Complaints dashboard; the BANs indicate that Closed is teal and Open is red. Armed with this knowledge we know exactly what to make of the chart in Figure 7.

Figure 7 — Everything prior to November 2016 is closed and only a handful of things in November are open.

The same goes for the Agency Utilization dashboard. The BANs inform us that blue represents Fees and green represents Potential so I know exactly how to interpret bars that are those colors when I look at a chart like the one shown in Figure 8.

Figure 8 — Because the BANs told me how to interpret color, we can see that for Technology the company billed \$883K but could bill an additional \$1,762K if it were to hit its targets.

## Conclusion

BANs can do a lot to help people understand key components of your dashboard: they can be conversation starters (and finishers), provide context to adjacent charts, and serve as a universal color legend.

Note: While I’ve tried to show how effective BANs can be I did not address how a particular font can help / hurt your BAN initiative.

The Big Book of Dashboards co-author Jeff Shaffer has been studying font use and has a fascinating take on the new fonts Tableau added to their product this past year. You can read about it here.

Tagged with: , , , ,

February 10, 2017

Rosling was a Swedish professor of Global Health who, using data, stunning visualizations, and incredible charm, changed the way people understand the world.

I first became aware of his work six years ago when a friend showed me Rosling’s TED Talk from 2006.  If you’ve not seen it I encourage you to watch it now.

I’ve seen this video hundreds of times and it never fails to make me want to be better at what I do. It’s also a master class in how to give a truly great presentation.  I defy you not to be completely won over when he compares the knowledge chimpanzees have of the world with the knowledge the committee that awards the Nobel Prize in medicine has of the world.

Rosling was a true pioneer in using data — and in particular, visualizing data — to help correct peoples’ misperceptions about the world.  He didn’t set out to be a visualization visionary; he just realized that he needed to create new techniques so people would be able to better see and understand the world.  As he states in another of his videos, “having the data is not enough.  I have to show it in ways people both enjoy and understand.”

I had the pleasure of meeting Rosling at the 2014 Tableau Conference where Rosling was a keynote speaker.  I was invited to a special breakfast and was fortunate to be able to sit next to him.

He could not have been a more engaging and delightful dining companion. I told him that my daughter was majoring in Global Health and asked if I could take a picture with him. He gladly consented and suggested we pretend to be engaged in lively banter.

Me with Hans Rosling in 2014.

One thing I want to underscore about Rosling was his unbridled optimism for humankind. If somebody ignorantly claimed how bad things are compared with 50 years ago, he would counter with facts to show just how much better things are now. With fervor, he would cite amazing progress in eradicating malaria, educating young girls, lifting people out of poverty and decreasing the number of children born while increasing the average lifespan of people living in poor countries.

And he was steadfastly convinced that if we can fight ignorance and implement policies based on facts, the world will be a much better place in 50 years than it is now.

Let’s do what we can to prove him right.

Steve Wexler
February 10, 2017

Learn about Gapminder, an organization that Rosling co-founded with his wife and son.

Watch more of Rosling’s TED Talks.

Note: At the Tableau breakfast, when we went to sit down I offered my “prime” seat to Jock Mackinlay, Tableau’s VP of Research and Design.  Jock told me he had sat next to Rosling at the London Conference earlier that year and I should keep my seat.

Thank you, Jock.

Tagged with: , , ,

By Steve Wexler

January 17, 2017

This is a follow-up to the post Jeff Shaffer and I wrote about what can happen when people fail to question sources and inadvertently amplify baseless findings.

## Overview

There’s been great feedback on things the community can do to maintain all the good things about Makeover Monday (MM) and at the same time reduce the occurrence of bad things.  Before I go any further I want to reiterate that I both like and value MM.  I’ve seen some incredibly good dashboards that inform my consulting practice and I plan to publish a blog on some of the design and analytic masterpieces I’ve seen.

But first…

## Guilty as charged

Before presenting some recommendations, I want to cite an e-mail I received from Ben Jones pointing out some hypocrisy on my part. Ben writes:

“…did you know that the Axalta coating systems Global Automotive color popularity report that you used to make this viz doesn’t even mention the sample sizes, sampling plan, or methods of their surveys anywhere in the report?”

Ben is completely correct, and I immediately followed my own recommendations for the 100+ errant visualizations from week 1 of Makeover Monday 2017 and added this disclaimer to my dashboard:

Figure 1 — The disclaimer / warning that now appears on my dashboard.

So, why did I get in such a huff about folks not vetting their sources for the Australian pay gap but didn’t bother to vet the data for my makeover example?

It comes down to three things:

1. The number of people participating
2. The magnitude of the mistake
3. The importance of the data

For the Australian pay gap makeover there were

ONE — Over 100 people making

TWO — Big mistakes with

Three — “High-stakes” data

## Recommendations

Avoid “high-stakes” data sets. Some folks will make the argument that iPhone sales could be considered “high stakes” as someone might make a stock buying or selling decision based on this. Fair enough, but for MM I suggest Andy and Eva avoid data that deals with gender, race, guns, shootings, and anything else that is politically charged.

Get ideas from reputable data sources. Realize that source data from the Australian government is, in fact, good data. The problem is that it isn’t good for making meaningful gender comparisons. So, where did the assertion of giant wage gaps come from? The MM folks cited an article from Women’s Agenda.  I spoke with data journalist Chad Skelton and he suggested that for high-stakes subjects MM should stick with ideas from the likes of The Guardian, The Wall Street Journal, and other well-known entities where reporters confer with statisticians and economists before publishing conclusions.

Note: Jeff Shaffer did some more digging and it appears that Women’s Agenda had in fact republished findings from Business Insider Australia. I have sent e-mails to editors at both publications.

Place a prominent Makeover Monday logo and disclaimer on every dashboard. Since Makeover Monday is an exercise in data visualization redesign, why not have participants trumpet MM’s purpose on every dashboard?  By having a small but prominent logo along with a disclaimer we can both raise awareness of the project and caution people that the purpose of the dashboard is to practice data visualization techniques. My only concern is that people will think this gives them a free pass to present a bogus headline and/or specious findings.

While I think the logo below is spoken for, maybe something like it would help viewers know that for the dashboard in question, it’s not about the data, it’s about the design.

Figure 2 — Possible logo for Makeover Monday?

## Vetting the data — the birth of “Find the Flaw Friday”?

In the previous post, I indicated that the one hour MM recommends people spend on a makeover is rarely enough time to properly vet the data, let along craft a visualization.  And asking Eva and Andy to vet all the data is completely impractical as I’m sure it’s hard enough to curate the project as it stands now.

Many of the people I corresponded with have said that exploring and really understanding the data is as important, if not more important, than designing a cool dashboard (this gets us back to the substance over style discussion.) A lot of people indicated they find exploring and vetting data more interesting than fashioning visualizations.

So, how can we help people practice analysis and not just design and presentation?

At least five people indicated they would be up for what I will call “Find the Flaw Friday” (FFF) where people are tasked with exploring a data set and determining what analysis would be sound and what would likely to be flawed. I’m not sure how easy this will be to manage and how much time people will need to spend on each project, but I will put the folks that expressed an interest in touch with each other and we’ll see what materializes.

If this works perhaps the data sets from FFF could feed MM?

## Final thought — Please fix (or take down) your erroneous dashboards

I consider wage inequality a big deal, and if you are going publish anything that’s about a big deal, you need to get it right.

So, now that people know that their week 1 MM dashboards are wrong, what should they do?

FIX THEM!

In conferring with Chad, he wonders how people would deal with something like this in their jobs.  For example, suppose you create a beautiful visualization for the CEO and realize later that the central point of the viz is deeply flawed?  Even if the mistake weren’t your fault would you rush to correct the record to keep your reputation intact?

Everyone — and I would argue that this should come from the Andy and Eva as well as the folks that manage Tableau Public — should tweet about the errors to make sure people don’t continue to perpetuate the misinformation.

It’s one thing to get something wrong. It’s another thing to know something is wrong and not fix it.

## Postscript

Tableau Zen Master Brit Cava recommends a fascinating Freakonomics podcast with economist and gender gap expert Claudia Goldin.  See http://freakonomics.com/podcast/the-true-story-of-the-gender-pay-gap-a-new-freakonomics-radio-podcast/.  VERY worthwhile.

Tagged with: ,

By Steve Wexler, January 15, 2017

## Overview

Before going any further this post assumes you’ve gotten your data “just so”; that is, you’ve reshaped your data and have the responses in both text and numeric form.

If you’re not sure what this means, please review this post.

## Taking inventory by finding the universe of all questions and all responses

Anyone who has attended one of my survey data classes or has watched my 2014 Tableau Conference presentation knows the first thing I like to do is assemble a demographics dashboard so I know who took the survey.

The second thing I do arose a few months ago when I had a “why didn’t I do this before” moment with respect to getting a good handle on questions, responses, and seeing if there was anything that was poorly coded.

Here’s how it works.

Note that I’m using the same sample data set that I use for my classes.

Figure 1 — Screen shot after just having connected to the data.

1. Drag Question Grouping onto rows, followed by Wording, and then Question ID, as shown below.

Figure 2 — About half-way through taking inventory.

2. Right-click the measure called Value and select Duplicate.
3. Rename the newly-created field Value (discrete).
4. Drag the measure into the Dimensions area. This will make Tableau treat the field as something that is by default discrete.

Figure 3 — Turning the measure into a discrete dimension

5. Drag new newly-created dimension to Rows.
6. Drag Labels to Rows. Your screen should look like the one shown below.

Figure 4 — All the questions and all the responses, all on one sheet.

## So, just what do we have here?

You can see from the portion of the screen that you have a bunch of questions about “Importance” and can also see that the possible values go from 0 to 4 where 0 maps to “Not at all important”, 1 maps to “Of Little Importance”, etc.

At this point you should be looking for any stray values, say a value of 5.

If you scroll down a little bit (Figure 5) you’ll see a question grouping called “Indicate the degree to which you agree” where you again have values of 0 through 4 but this time 0 maps to “Not at all”, 1 maps to “Small degree”, etc.

We should be pleased as it appears that our Likert questions consistently go from 0 through 4. This means we won’t have to craft multiple sets of calculated fields to deal with different numeric scales (not that having to do that would be a big deal).

Figure 5 — Next set of questions.

At this point it might be useful to add a filter so you can focus on only certain question groups. You can do this by filtering by Question Grouping as shown below.

Figure 6 — Adding a filter makes it easier to focus on specific question groupings

## Spotting questions that have coding errors

In case you’re wondering what a coding error looks like, see what happens if we just focus on the “What do you measure” questions, as shown below.

Figure 7 — An example of a mis-coded check-all-that-apply question

So, for all of the check-all-that-apply questions the universe of possible values is 0 and 1. And with the exception of Question Q6 (Breathing), 0 maps to “No” and 1 maps to “Yes.”

The mis-coding of “Ni” instead of “No” will only present a problem if our calculated field for determining the percentage of people that checked an item were to use Labels instead of Values. My preferred formula for this type of calculation is this:

SUM([Value]) / SUM([Number of Records)])

Because we’re using [Value] instead of [Label], the miscoding for this example won’t cause a problem.

## Conclusion

Creating a giant text table that maps all Question Groupings, Question IDs, Labels, and Values on a single sheet allows us to quickly take an inventory of all questions and possible responses.  This in turn allows us to see if questions are coded consistently (e.g., do all the Likert Scale questions use the same scale) and to see if there are any coding errors.

I just wish I had started doing this years ago.

Tagged with: ,

January 9, 2017

## Overview

Makeover Monday, started by Andy Kriebel in 2009 and turned into a weekly social data project by Kriebel and Andy Cotgreave in 2016, is now one of the biggest community endeavors in data visualization. By the end of 2016 there were over 3,000 submissions and 2017 began with record-breaking numbers, with over 100 makeovers in the first week. We are big fans of this project and it’s because of the project’s tremendous success and our love and respect for the two Andys (and now Eva Murray) that we feel compelled to write this post.

Unfortunately, 2017 started off with a truly grand fiasco as over 100 people published findings that cannot be substantiated. In just a few days the MM community has done a lot damage (and if it doesn’t act quickly it will do even more damage.)

## What happened

Woah!  That’s quite an indictment. What happened, exactly?

Here’s the article that inspired the Makeover Monday assignment.

So, what’s the problem?

The claims in the article are wrong.  Really, really wrong.

And now, thanks to over 100 well-meaning people, instead of one website that got it really, really wrong there are over 100 tweets, blog posts, and web pages that got it really, really wrong.

It appears that Makeover Monday participants assumed the following about the data and the headline:

• The data is cited by Makeover Monday so it must be good data.
• The data comes from the Australian Government so it must be good data that is appropriate for the analysis in question.
• The headline comes from what appears to be a reputable source, so it must be true.

## Some Caveats

Before continuing we want to acknowledge that there is a wage gap in Australia; it just isn’t nearly as pronounced as this article and the makeovers suggest.

The data also looks highly reputable; it’s just not appropriate data for making a useful comparison on wages.

Also, we did not look at all 100+ makeovers. But of the 40 that we did review all of them parroted the findings of the source article.

## Some makeover examples

Here are some examples from the 100+ people that created dashboards.

Figure 2 — A beautiful viz that almost certainly makes bogus claims. Source: https://public.tableau.com/profile/publish/Australias50highestpayingjobsarepayingmensignificantlymore

Figure 3– Another beautiful viz that almost certainly makes bogus claims.  Source: https://public.tableau.com/profile/publish/MM12017/Dashboard1#!/publish-confirm

Figure 4 — A third beautiful viz that almost certainly makes bogus claims.  Source: https://public.tableau.com/profile/publish/AustraliaPayGap_0/Dashboard1#!/publish-confirm

Figure 5 — Yet another beautiful viz that almost certainly makes bogus claims.  Source: https://public.tableau.com/views/GenderDisparityinAustralia/GenderInequality?:embed=y&:display_count=yes&:showVizHome=no#1

Goodness! These dashboards (and the dozens of others that we’ve reviewed) are highlighting a horrible injustice!

[we’re being sarcastic]

Let’s hold off before joining a protest march.

## Why these makeovers are wrong

Step back and think for a minute. Over 100 people created a visualization on the gender wage gap and of the dashboards we reviewed, they all visualized, in some form, the difference between male Ophthalmologists earning \$552,947 and females that only earned \$217,242 (this is the largest gap in the data set.)

Did any of these people ask “Can this be right?”

This should be setting off alarm bells!

There are two BIG factors that make the data we have unusable.

One — The data is based on averages, and without knowing the distributions there’s no way to determine if the data provides an accurate representation.

Here’s a tongue-in-cheek graphic that underscores why averages may not be suited for our comparison.

Figure 6 — The danger of using averages.  From Why Not to Trust Statistics.

Here’s another real-world graphic from Ben Jones that compares the salaries of Seattle Seahawks football players.

Figure 7 — Seattle Seahawks salary distributions. Source: Ben Jones.

Ben points out

The “average” Seahawks salary this year is \$2.8M. If you asked the players on the team whether it’s typical for one of them to make around \$3M, they’d say “Hell No!”

Two — The data doesn’t consider part time vs. full time work. The data is from tax returns and doesn’t take into account the number of hours worked.

Let’s see how these two factors work with a “for instance” from the source data.

Figure 8 — A snippet of the source data in question.

So, there are 143 women Ophthalmologists making an average of \$217K and 423 males making an average of \$552K.

Are the women in fact being paid way less?  On average, yes, but suppose the following were the case:

Of the 143 women, 51 work only 25 hours per week

And of those 423 men, 14 of them are making crazy high wages (e.g., one of them is on retainer with the Sultan of Brunei).

Could the 51 part-time workers and the 14 insanely-paid workers exaggerate the gap?

Absolutely.

Is this scenario likely?

Very likely.

We did some digging and discovered that as of 2010, 17% of the male workforce in Australia was working part time while 46% of the female workforce was working part time.

This single factor could explain the gap in its entirety.

Note: Not knowing the number of hours worked is only one problem. The data also doesn’t address years of experience, tenure, location, or education, all of which may contribute to the gap.

## Findings from other surveys

We did some more digging…

Data from the Workplace Gender Equality Agency (an Australian Government statutory agency) shows that in the Health Care field, 85% of the part-time workers in 2016 were female. This same report shows a 15% pay gap for full-time Health Care employees and only a 1% gap for part-time employees.

Finally, a comprehensive study titled Differences in practice and personal profiles between male and female ophthalmologists, was published in 2005. Key findings from this survey of 254 respondents show:

• 41% of females worked 40 hours per week compared with 70% for males.
• 57.5% of females worked part-time compared with 13.6% for males.
• The average income for females was AUS\$ 38,000 less than males, not \$335,000 less.
(Yes, that’s still a big gap, but it’s almost 10 times less than what the article claims).

## Why this causes so much damage

It would keep me up at night to think that something I did would lead to somebody saying this:

“Wait!  You think the wage gap here is bad; you should see what it’s like in Australia.  Just the other day I was looking at this really cool infographic…”

So, here we are spreading misinformation. And it appears we did it over 100 times! The visualizations have now been favorited over 500 times, retweeted, and one was featured as the first Tableau Viz of the Day for 2017.

We’re supposed to be the good guys, people that cry foul when we see things like this:

Figure 9 — Notorious Fox News Misleading Graphic.

Publishing bogus findings undermines our credibility. It suggests we value style over substance, that we don’t know enough to relentlessly question our data sources, and that we don’t understand when averages work and when they don’t.

It may also make people question everything we publish from now on.

And it desensitizes us to the actual numbers.

Let us explain. There is clearly a gender wage gap in Australia. The Australian government reports the gender wage gap based on total compensation to be around 26% for all industries, 23% for full-time and 15% for full-time health care (base pay is a smaller gap). While we can’t calculate the exact difference for full-time or part-time ophthalmologists (because we only have survey data from 2005), it appears to be less than 15%.

Whatever the number is, it’s far less than the 150% wage gap shown on all the makeovers we reviewed.

And because we’ve reported crazy large amounts, when we see the actual amount — say 15% — instead of protesting a legitimate injustice, people will just shrug because 15% now seems so small.

## How to fix this

This is not the first time in MM’s history that questionable data and the lack of proper interrogation has produced erroneous results (see here and here.) The difference is that this time we have more than 100 people publishing what is in fact really, really wrong.

So, how do we, the community, fix this?

• If you published a dashboard, you should seriously consider publishing a retraction. Many of you have lots of followers, and that’s great. Now tell these followers about this so they don’t spread the misinformation. We suggest adding a prominent disclaimer on your visualization.
• The good folks at MM recommend that participants should spend no more than one hour working on makeovers. While this is a practical recommendation, you must realize that good work, accurate work, work you can trust, can take much more than one hour. One hour is rarely enough time to vet the data, let alone craft an accurate analysis.
• Don’t assume that just because Andy and Eva published the data (and shared a headline that too many people mimicked without thinking) that everything about the data and headline is fine and dandy. Specifically:
• Never trust the data! You should question is ruthlessly:
• What is the source?
• Do you trust the source? The source probably isn’t trying to deceive you, but the data presented may not be right for the analysis you wish to conduct.
• What does the data look like? Is it raw data or aggregations? Is it normalized?
• If it’s survey data, or a data sample, is it representative of the population? Is the sample size large enough?
• Does the data pass a reasonableness test?
• Do not trust somebody else’s conclusions without analyzing their argument.

Remember, the responsibility of the data integrity does not rest solely with the creator or provider of the data. The person performing the analysis needs to take great care in whatever he / she presents.

Alberto Cairo may have expressed it best:

Unfortunately, it is very easy just to get the data and visualize it. I have fallen victim of that drive myself, many times. What is the solution? Avoid designing the graphic. Think about the data first. That’s it.

We realize that the primary purpose of the Makeover Monday project is for the community to learn and we acknowledge that this can be done without verified data. As an example, people are learning Tableau everyday using the Superstore data, data that serves no real-world purpose. However, the community must realize that the MM data sets are real-world data sets, not fake data. If you build stories using incorrect data and faulty assumptions then you contribute to the spread of misinformation

Jeffrey A. Shaffer

Steve Wexler

Why not trust statistics. Read this to see why the wrong statistic applied the wrong way makes you just plain wrong (thank you, Troy Magennis).

Simpson’s Paradox and UC Berkeley Gender Bias

The Truthful Art by Alberto Cairo.  If everyone would just read this we wouldn’t have to issue mass retractions (you are going to publish a retraction, aren’t you?)

Avoiding Data Pitfalls by Ben Jones. Not yet available, but this looks like a “must read” when it comes out.

## Sources:

1. Trend in Hours worked from Australian Labour Market Statistics, Oct 2010.

http://www.abs.gov.au/ausstats/abs@.nsf/featurearticlesbytitle/67AB5016DD143FA6CA2578680014A9D9?OpenDocument

2. Workplace Gender Equality Agency Data Explorer

http://data.wgea.gov.au/industries/1

3. Differences in practice and personal profiles between male and female ophthalmologists, Danesh-Meyer HV1, Deva NC, Ku JY, Carroll SC, Tan YW, Gamble G, 2007.

https://www.ncbi.nlm.nih.gov/pubmed/17539782?dopt=Citation

4. Gender Equity Insights 2016: Inside Australia’s Gender Pay Gap, WGEA Gender Equity Series, 2016.