May 102017
 

May 10, 2017

Overview

Most organizations want to wildly exceed customer expectations for all facets of all their products and services, but if your organization is like most, you’re not going to be able to do this. Therefore, how should you allocate money and resources?

First, make sure you are not putting time and attention into things that aren’t important to your customers and make sure you satisfy customers with the things that are important.

One way to do this is to create a survey that contains two parallel sets of questions that ask customers to indicate the importance of certain features / services with how satisfied they are with those products and services.  A snippet of what this might look like to a survey taker is shown in Figure 1.

Figure 1 -- How the importance vs. satisfaction questions might appear to the person taking the survey.

Figure 1 — How the importance vs. satisfaction questions might appear to the person taking the survey.

How to Visualize the Results

I’ve come up with a half dozen ways to show the results and will share three approaches in this blog post.  All three approaches use the concept of “Top 2 Boxes” where we compare the percentage of people who indicated Important or Very Important (the top two possible choices out of five for importance) and Satisfied or Very Satisfied (again, the top two choices for Satisfaction).

Bar-In-Bar Chart

Figure 2 shows a bar-in-bar chart, sorted by the items that are most important.

Figure 2 -- Bar-in-bar chart

Figure 2 — Bar-in-bar chart

This works fine, as would having a bar and a vertical reference line.

It’s easy to see that we are disappointing our customers in everything except the least important category and that the gap between importance and satisfaction is particular pronounced in Ability to Customer UI (we’re not doing so well in Response Time, 24-7 Support, and East of Use, either.)

Scatterplot with 45-degree line

Figure 3 shows a scatterplot that compares the percent top 2 boxes for Importance plotted against the percent top 2 boxes for Satisfaction where each mark is a different attribute in our study.

Figure 3 -- Scatterplot with 45-degree reference line

Figure 3 — Scatterplot with 45-degree reference line

The goal is to be as close to the 45-degree line as possible in that you want to match satisfaction with importance. That is, you don’t want to underserve customers (have marks below the line) but you probably don’t want to overserve, either, as marks above the line suggest you may be putting to many resources into things that are not that important to your customers.

As with the previous example it’s easy to see the one place where we are exceeding expectations and the three places where we’re quite a bit behind.

Dot Plot with Line

Of the half dozen or so approaches the one I like most is the connected dot plot, shown in Figure 4.

Figure 4 -- Connected dot plot. This is the viz I like the most.

Figure 4 — Connected dot plot. This is the viz I like the most.

(I placed “I like most” in italics because all the visualizations I’ve shown “work” and one of them might resonate more with your audience than this one.  Just because I like it doesn’t mean it will be the best for your organization so get feedback before deploying.)

In the connected dot plot the dots show the top 2 boxes for importance compared to the top 2 boxes for satisfaction.  The line between them underscores the gap.

I like this viz because it is sortable and easy to see where the gaps are most pronounced.

But what about a Divergent Stacked Bar Chart?

Yes, this is my “go to” viz for Likert-scale things and I do in fact incorporate such a view in the drill-down dashboard found at the end of this blog post. I did in fact experiment with the view but found that while it worked for comparing one feature at a time it was difficult to understand when comparing all 10 features (See Figure 5.)

Figure 5 -- Divergent stacked bar overload (too much of a good thing).

Figure 5 — Divergent stacked bar overload (too much of a good thing).

How to Build This — Make Sure the Data is Set Up Correctly

As with everything survey related, it’s critical that the data be set up properly. In this case for each Question ID we have something that maps that ID to a human readable question / feature and groups related questions together, as shown in Figure 6.

Figure 6 -- Mapping the question IDs to human readable form and grouping related questions

Figure 6 — Mapping the question IDs to human readable form and grouping related questions

Having the data set up “just so” allows us to quickly build a useful, albeit hard to parse, comparison of Importance vs. Satisfaction, as shown in Figure 7.

Figure 7 -- Quick and dirty comparison of importance vs. satisfaction.

Figure 7 — Quick and dirty comparison of importance vs. satisfaction.

Here we are just showing the questions that pertain to Importance and Satisfaction (1). Note that measure [Percentage Top 2 Boxes] that is on Columns (2) is defined as follows.

Figure 8 -- Calculated field for determining the percentage of people that selected the top 2 boxes.

Figure 8 — Calculated field for determining the percentage of people that selected the top 2 boxes.

Why >=3?  It turns out that the Likert scale for this data went from 0 to 4, so here we just want to add up everyone who selected a 3 or a 4.

Not Quite Ready to Rock and Roll

This calculated field will work for many of the visualizations we might want to create, but it won’t work for the scatterplot and it will give us some headaches when we attempt to add some discrete measures to the header that surrounds our chart (the % Diff text that appears to the left of the dot plot in Figure 4.) So, instead of having a single calculation I created two separate calculations to compute % top 2 boxes Importance and % top 2 boxes Satisfaction. The calculation for Importance is shown in Figure 9.

Figure 9 -- Calculated field for determining the percentage of folks that selected the top two boxes for Importance.

Figure 9 — Calculated field for determining the percentage of folks that selected the top two boxes for Importance.

Notice that we have all the rows associated with both the Importance questions and Satisfaction “in play”, as it were, but we’re only tabulating results for the Importance questions so we’re dividing by half of the total number of records.

We’ll need to create a similar calculated field for the Satisfaction questions.

Ready to Rock and Roll

Understanding the Dot Plot

Figure 10 shows what drives the Dot Plot (we’ll add the connecting line in a moment.)

Figure 10 -- Dissecting the Dot Plot.

Figure 10 — Dissecting the Dot Plot.

Here we see that we have a Shape chart (1) that will display two different Measure Values (2) and that Measure Names (3) is controlling Shape and Color.

Creating the Connecting Line Chart

Figure 11 shows how the Line chart that connects the shapes are built.

Figure 11 -- Dissecting the Line chart

Figure 11 — Dissecting the Line chart.

Notice that Measure Values is on Rows a second time (1) but the second instance the mark type is a Line (2) and that the end points are connected using the Measure Names on the Path (3).  Also notice that there is no longer anything controlling the Color as we want a line that is only one color.

Combining the Two Charts

The only thing we need to do now is combine the two charts into one by making a dual axis chart, to synchronize the secondary axis, and hide the secondary header (Figure 12.)

Figure 12 -- the Completed connected Dot Plot.

Figure 12 — the Completed connected Dot Plot.

What to Look for in the Dashboard

Any chart that answers a question usually fosters more questions. Consider the really big gap in Ability to Customize UI. Did all respondents indicate this, or only some?

And if one group was considerably more pronounced than others, what were the actual responses across the board (vs. just looking at the percent top 2 boxes)?

Figure 13 -- Getting the details on how one group responded

Figure 13 — Getting the details on how one group responded

The dashboard embedded below shows how you can answer these questions.

Got another approach that you think works better?  Let me know.

Apr 252017
 

April 25, 2017

Overview

I became a big fan of adding a marginal histogram to scatterplots when I first saw them applied in Tableau visualizations from Shine Pulikathara and Ben Jones.

For those not familiar with how these work, consider the scatterplot shown in Figure 1 that shows the relationship between salary and age.

Figure 1 -- Comparing Age and Salary on a scatterplot.

Figure 1 — Comparing Age and Salary on a scatterplot

Some interesting things here; for example,  we can see that salaries appear to be highest between ages 50 and 55 and lowest among the youngest and older workers.

But look what happens when we add marginal histograms to the x and y axes (Figure 2.)

Figure 2 -- Scatterplot with marginal histogram

Figure 2 — Scatterplot with marginal histogram

Whoa! The two bar charts to the right and below the main chart add a lot of insight into the data.  We don’t just see the correlations, but now we can also see age demographics and salary distribution in the organization.

Marginal Histograms and Jitterplots

The marginal histogram works with other visualizations as well. Consider the dot plot with jitter (jitterplot) example from Lean management tool innovator LeanKit in Figure 3.

Figure 3 -- Individual and aggregate vies of important data from LeanKit

Figure 3 — Individual and aggregate vies of important data from LeanKit

The combination of the individual data points (the jittered dots that represent Kanban cards) and the aggregated data (stacked bar charts) tells a more complete story than having only the aggregation or only the individual dots.

Marginal Histograms and Highlight Tables

Readers of this blog know I like highlight tables and often use them as a “visualization gateway drug” to move people from cross tabs to more insightful ways of looking at their data.

But as great as they are, they do not lend themselves to accurate comparisons of the data. Consider Figure 4 where we see the percentage of sales broken down by region.

Figure 4 -- Sorted highlight table showing percentage of sales by sub-category and region

Figure 4 — Sorted highlight table showing percentage of sales by sub-category and region

Yes, I can see that Phones in the East is a lot darker than Copiers in the West, but without the numbers there’s no way to could do an exact comparison as I don’t know of anyone that can look at just the color coding and exclaim “ah, that cell is twice as blue as that other cell.”

But look what happens when we add the marginal histogram to the visualization, as shown in Figure 5.

Figure 5 -- Sorted highlight table with marginal histograms. Here we see percentage of sales.

Figure 5 — Sorted highlight table with marginal histograms. Here we see percentage of sales.

So much added insight, and so little added screen real estate!

I’ll confess that the histograms don’t work quite as well if you have negative values. Here’s what it looks like if we look at percentage of profit broken down by sub-category and region.

Figure 6 -- Sorted highlight table with marginal histograms. Here we see percentage of profit.

Figure 6 — Sorted highlight table with marginal histograms. Here we see percentage of profit.

Because we have bars pointing in different directions for the histogram on the right the look isn’t quite as clean, but it certainly works.

See for Yourself

I’ve included an embedded dashboard below where you can experiment with different metrics and different sorting choices. Feel free to download and “look under the hood.”

Note that making this type of dashboard is not very difficult; the only tricky part is getting the three elements to align properly. Ben Jones gets into those particulars in his blog post.

 

Apr 052017
 

More thoughts on the Markimekko chart and in particular how to build one in Tableau.

April 4, 2017

Overview

Given my reluctance to embrace odd chart types and my conviction that I would find something better I was surprised to find myself last month writing about — and endorsing — the Marimekko chart.

If I was surprised then I’m absolutely gobsmacked to be writing about it again.

What precipitated all this was another very good example of the chart in the wild. After admiring it I couldn’t help but “look under the hood” (hey, we are talking about Tableau Public and people sharing this stuff freely) and I thought that the dashboard designer was working harder than he needed to build the visualization.

So, if people are going to use these things I thought I would share an alternative, and I think easier, technique for building them.

The Great Example from Neil Richards

Here’s the terrific Makeover Monday dashboard from Neil Richards where we see the likelihood of certain jobs being replaced by automation.

01_Neil

Neil does a great job highlighting some of the more interesting findings, but if you want to know more than what Neil highlights you’ll need to explore the dashboard on your own.

Notice that in both this case and in Emma Whyte’s we are dealing with only two data segments; e.g., male vs. female and at-risk vs. not at-risk jobs. Having only two colors is one of the main reasons why the chart works well.

Okay! Uncle! I agree that under the right conditions this is a useful chart and I can see what you may want to make one.

But is there an easier way to make one?

An Easier Way to Create a Markimekko Chart in Tableau

It turns out the same technique Joe Mako showed me six years ago for building a divergent stacked bar chart works great for fashioning a Markimekko.  Let’s see how to do this using Superstore data with fields similar to what was available in both Emma and Neil’s dashboards.

Let’s say I want to compare the magnitude of sales with the profitability of items by region.  Figure 2 shows the overall magnitude of sales but makes comparing profitability difficult.

Figure 2 -- Overall sales is easy to see but comparing profitability across regions is difficult.

Figure 2 — Overall sales is easy to see but comparing profitability across regions is difficult.

Here’s another attempt using a 100% stacked bar chart.

Figure 3 -- Showing profitability with a 100% stacked bar chart.

Figure 3 — Showing profitability with a 100% stacked bar chart.

Yes, this does a much better job allowing us to compare the profitability of each region, but there’s no way to easily glean that Sales in the West is almost double sales in the South (which is easy to do in Figure 2.)

So, how can we make the regions that have large sales be wide and the regions that have small sales be  narrow?

Understanding the Fields

Before going much further let’s make sure we understand the following three fields:

  • Percentage Profitable Sales
  • Percentage Unprofitable Sales
  • Sales Percentage of
[Percentage Profitable Sales]

This is defined as

SUM(IF [Profit]>=0 THEN [Sales] END)/SUM(Sales)

… and translates as “if the profit for an item within a partition is profitable, add it up, then divide by the total sales within the partition.”

This is the field that gives us the 90%, 77%, 76%, and 72% results shown in Figure 3.

[Percentage Unprofitable Sales]

This is defined as

1 - [Percentage of Profitable Sales]

… and gives us the 10%, 23%, 24%, ad 28% shown in Figure 3.

[Sales Percentage of]

This is defined as

SUM([Sales]) /TOTAL(SUM([Sales]))

… and we will use it to compute the percentage of sales across the four regions (i.e., show me the sales for one region divided by the sales for all the regions). Here’s how we might use it in a visualization.

Figure 4 -- Using the calculation to figure out how wide each region should be.

Figure 4 — Using the calculation to figure out how wide each region should be.

So, in Figure 4 we can see that the West segment is a lot thicker than the South segment.

How can we apply this additional depth to what we had in Figure 3?

Make it Easy to See if the Math is Correct

At this point it will be helpful to see the interplay of the various measures and dimensions using a cross tab like the one shown in Figure 5.

Figure 5 -- Cross tab showing the relationship among the different measures and dimensions.

Figure 5 — Cross tab showing the relationship among the different measures and dimensions.

The first four columns are easy to interpret:

“I see that sales in the West is $725,458 of which 10% is unprofitable and 90% is profitable.  That $725,458 represents 31.6% of the total sales.”

But how is the field called [Start at] defined and how are we going to use it?

Understanding [Start at]

[Start at] is defined as

PREVIOUS_VALUE(0)+ZN(LOOKUP([Sales Percentage of],-1))

This is the calculation that figures out where the mark should start while [Sales Percentage of] will later determine how thick the mark should be.  Let’s see how this all works together.

Figure 6 -- How [Start at] and [Sales Percentage of] will work together.  Note that “Compute Using” for the two table calculations is set to [Region].

Figure 6 — How [Start at] and [Sales Percentage of] will work together.  Note that “Compute Using” for the two table calculations is set to [Region].

For the West region we want to start at 0% and have a bar that is 31.6% units side. The function

PREVIOUS_VALUE(0)

Tells Tableau to look at whatever is the value for [Sales at] for the row above and if there is no row above make the value 0 (see Item 1 in Figure 6, above.)

Add to this the value for [Sales Percentage of] in the previous row (Item 2 which is also not present) and you get 0 + 0 (Item 3).

For the East region we want to start wherever West left off (Item 3 plus Item 4, which gives us item 5) and make the mark 29.5% wide (item 6).

For the Central region we want to start wherever the previous region left off (Item 5 plus item 6, which gives us item 7) and make the mark 21.8% wide (Item 8).

Let’s see how this all fits together into the Marimekko visualization in Figure 7.

Figure 7 -- Using [Start at ] and [Sales Percentage of] to make the Marimekko work.

Figure 7 — Using [Start at ] and [Sales Percentage of] to make the Marimekko work.

There are three things to keep in mind.

  1. [Start at] is on columns and determines the starting point (how far to the right) for each of the regions.
  2. [Sales Percentage of] is on Size and determines how thick the bars should be.
  3. Size is set to Fixed width, left aligned, where Fixed means the measure on the Size shelf is determining the thickness.
Figure 8 -- Size must be fixed and left-aligned.

Figure 8 — Size must be fixed and left-aligned.

Some Interesting Findings

I built a parameter-driven version of the Marimekko (embedded at the end of this blog post) that allows the viewer to select different dimensions and different ways to sort. Here’s what happens when we look at Sub-Category sorted by Profitability.

Figure 9 -- Profitability by Sub-Category.

Figure 9 — Profitability by Sub-Category.

Okay, not a big surprise here given how many visualizations we’ve all seen showing that Tables are problematic.

That said, I was in for a surprise when I broke this down by state and sorted by the magnitude of sales, as shown below.

Figure 10 -- Profitability by state, sorted by Sales.

Figure 10 — Profitability by state, sorted by Sales.

Wow, after 11 years of living with this data set I never realized that 60% of the items sold in Texas were unprofitable.  Who knew?

To be honest I’m not convinced we need a Marimekko to see this clearly.  A simple sorted bar chart will do the trick, as shown in Figure 11.

Figure 11 -- Sorted bar chart.

Figure 11 — Sorted bar chart.

Indeed, I think this very simple view is better than the Marimekko in many respects.

I guess it depends what you’re trying to get across.

See for Yourself

I’ve included an embedded workbook that has the Superstore example as well as versions of the visualizations Emma Whyte and Neil Richards built, but using this alternative technique.

I encourage you to think long and hard before deploying a Marimekko.  But if you do decide to build one I hope the techniques I explored here will prove useful.

 

Mar 202017
 

Or

How I stopped worrying and learned to love appreciate the Marimekko

March 19, 2017

Overview

Readers of my blog know that I suffer from what Maarten Lambrechts calls xenographphobia, the fear of unusual graphics.  I’ll encounter a chart type that I’ve not seen before, purse my lips, and think (smugly) that there is undoubtedly a better way to show the data than in this novel and, to me, unusual chart.

That was certainly my reaction to “Marimekko Mania” when Tableau 10.0 was first released. I didn’t see a solid use case for this chart. There were some wonderful blog posts from Jonathan Drummey and Bridget Cogley on the subject, but I just wasn’t buying the need for the chart type.

Note: It turns that for many situations you can make a perfectly fine Marimekko just using table calculations. I’ll weigh in on this later.

Enter Emma Whyte and Workout Wednesday

My “I’ll never need to use that” arrogance was disrupted a few weeks ago when I read this blog post from Emma Whyte.  The backstory is that Emma reviewed a Junk Charts makeover of a Wall Street Journal graphic, really liked the makeover, and decided to recreate it in Tableau.

Here’s the Wall Street Journal graphic.

Figure 1 -- Source of inspiration for Junk Charts  and Emma Whyte. From a 2016 survey by LeanIn.org and McKinsey & Co.

Figure 1 — Source of inspiration for Junk Charts  and Emma Whyte. From a 2016 survey by LeanIn.org and McKinsey & Co.

There are two important things the data is trying to tell us:

  1. The percentage of women decreases, a lot, the higher up you go in the corporate hierarchy; and,
  2. There are far more entry-level positions than there are managers than there are VPs, etc.

The chart does a good job on the first point but only uses text to covey the second point.

Contrast this with Emmy Whyte’s visualization:

Figure 2 -- Emma Whyte's makeover.

Figure 2 — Emma Whyte’s makeover.

Whoa.

I immediately “grokked” this.  There are way more men than women among VPs, Senior VPs, and in the C-Suite, but look how much narrower those bars are!  True, I cannot easily compare how much wider the Entry Level column is than the VP column, but is that really important?

Is the Marimekko in fact the “right” way to show this?

Being a little bit stubborn I was not ready to declare a Marimekko victory so I decided to see if I could build something that worked as well, if not better, using more common chart types.

Anything You Can Do, I Can Do…

I won’t go through all ten iterations I came up with but I will show some of my attempts to convey the data accurately and with the visceral wallop I get from Emma’s makeover.

100% Stacked Bar with Marginal Histogram

Putting a histogram in the margin has become a “go to” technique when I’m dealing with highlight tables and scatterplots so I thought that might work in this situation. Here’s a 100% stacked bar chart combined with a histogram.

Figure 3 -- 100% stacked bar with marginal histogram. 

Figure 3 — 100% stacked bar with marginal histogram.

I was so convinced this would just smoke the Marimekko. I mean just look how easy it is to make accurate comparisons!

That may be true, but I think the Marimekko in question does a better job.

Connected Dot Plot

Here’s another attempt using a connected dot plot.

Figure 4 -- Connected dot plot where the size of the circles reflects the percentage of the workforce.

Figure 4 — Connected dot plot where the size of the circles reflects the percentage of the workforce.

Here the lines separating the circles show the gender gap and the size of the circles reflects the percentage of the workforce.

OK, I think the gap is well represented but the spacing between job levels is a fixed width.  In my pursuit of accuracy I needed to find a way spread the circles based on percentage of the workforce.

Diverging Lines with Bands

Figure 5 shows two diverging lines with circles and bands that are proportionate to the percentage of the workforce (Entry level is 52 units wide, Manager is 28 units wide, and so on).

Figure 5 -- Diverging lines with dots and correctly-sized circles and bands

Figure 5 — Diverging lines with dots and correctly-sized circles and bands

But why are the lines sloping?  Shouldn’t the lines be flat for each job level?

Flat Lines

Here’s a similar approach but where the lines stay flat for each job level.

Figure 6 -- Flat lines and accurate circles and bands.

Figure 6 — Flat lines and accurate circles and bands.

More Approaches and the Graphic from the Actual Report

All told I made ten attempts.  The calculation I came up with for Figure 5 also made it possible to create a Markimekko just using a simple table calculation.

Note: I asked Jonathan Drummey to have a look at the Marimekko-with-table-calc approach and he points out that in both my example and Emma Whyte’s example the data isn’t “dense” so you can break the visualization simply by right-clicking a mark and selecting Exclude. That said, the technique is fine for static images and dashboards where you disable the Exclude functionality.

I also reviewed the full Women in the Workplace report and saw they used an interesting pipeline chart to relate the data.

Figure 7 -- "Pipeline" chart from Women in Workplace report (LeanIn.Org and McKinsey & Co.)

Figure 7 — “Pipeline” chart from Women in Workplace report (LeanIn.Org and McKinsey & Co.)

I applaud the creativity but have a lot of problems with the inaccurate proportions. Notice that this chart also has a sloping line suggesting a continuous decrease as you go from one level to another.

And The Winner is…

For me, Emma Whyte’s Marimekko does the best job of showing the data in a compelling and accurate format and I thank Emma for presenting such a worthwhile example.

Will I use this chart type in my practice?

It depends.

If the situation calls for it, I would try it along with other approaches and see what works best for the intended audience.

Here’s a link to the Tableau workbook that contains a copy of Emma Whyte’s original approach and many of my attempts to improve upon it. If you come up with an alternative approach that you think works well, please let me know.

Postscript

Big Book of Dashboards co-author Jeff Shaffer encouraged me to make some more attempts. Here’s a work in progress using jittering.

Jitter with bands

I think this looks promising.

Sep 072016
 

Overview

TruthfulArtImagine a terrific introductory college course presented by a terrific professor.

That’s the feeling I had in reading The Truthful Art, Alberto Cairo’s follow up to his first book The Functional Art.

Whereas his first book took a “look at what you can and should do” approach to help people see and understand data, The Truthful Art is more of a “here’s what you need to know” if you want to be a data journalist — and there’s a lot of things you need to know if you are going to do a proper job.

I’m reluctant to use the term “data journalism” as Cairo’s book is for anyone that that is tasked with helping people make sense of data. The difference is that the data journalist’s work is likely to be public and yours may only be seen by people working in your organization. But while you may not have to make a dashboard that is as polished as an infographic from the New York Times, both you and the data journalist need to adhere to a particular doctrine and have sufficient skills across a wide range of topics if you are going to build functional, truthful, and meaningful visualizations.

First, Be Truthful

If the credo for doctors is to “first, do no harm” Cairo might argue that the credo for data journalists is to “first, be truthful.” Cairo makes the case that a good visualization must be

  • Truthful
  • Functional
  • Beautiful
  • Insightful
  • Enlightening

And it must be these things in this order of priority. That is, the visualization must first be “relevant, factual, and accurate” and only then should it be “accessible and engaging.” Cairo further states that “honesty, clarity, and depth come first.” Indeed, this is why he bristles with outrage over deceitful graphics like this one.

So, how, exactly, does one create something that is truthful, functional, beautiful, insightful, and enlightening?

By achieving a sufficient level of competence in a LOT of different areas.

And just what are those areas?

The Data Journalism Landscape

In reading The Truthful Art you may feel like you are in a helicopter several thousand feet above the data visualization landscape. In each section Cairo, as expert guide, will gently descend to several hundred feet above a particular area and allow you to examine varied topics including design, statistics, color, storytelling, psychology, and ethics. While the book never gets deep into any of these subjects Cairo does provide excellent resources for anyone interested in exploring a particular topic in depth as every chapter of the book ends with a section titled “To Learn More.”

While Cairo’s writing is disarmingly warm and engaging he takes the responsibility of data storytelling very seriously. By the end of the book you will have an excellent understanding of the investment needed to make a worthwhile contribution to your company, society, or both.

Conclusion

Whether you are new to the field or have been practicing for years, I’m confident you’ll find The Truthful Art, like its predecessor, to be fun, elucidating, and inspiring.

The Truthful Art

Paperback: 400 pages

Publisher: New Riders; 1 edition (February 28, 2016)

Aug 112016
 

Overview

As readers of this blog know, I have my problems with donut charts.

That said, I acknowledge that they can be cool and, under certain circumstances, enormously useful.

On a recent flight I was struck by how much I liked the animated “estimated time to arrival” donut chart that appeared on my personal TV screen. An example of such a chart is shown in Figure 1.

Figure 1 -- Donut chart showing progress towards completion of a flight.

Figure 1 — Donut chart showing progress towards completion of a flight.

I find this image very attractive and very easy to understand — I can see that I’m almost three-quarters of the way to my destination and that there are only 49 minutes left to the flight.

So, given how clear and cool this is, why not use them on a dashboard?  And if one is good, why not use lots of them?

It’s the “more than one” situation that may lead to problems.

Trying to make comparisons with donut charts

The flight status chart works because it shows only one thing only: a single item’s progress towards a goal.

Let’s see what happens when we want to compare more than one item.

Consider the chart in Figure 2 that shows the placement rates for Fremontia Academy.

Figure 2 -- Donut chart showing placement percentage.

Figure 2 — Donut chart showing placement percentage.

A 95% placement percentage is really impressive.  Is that better than other institutions?  If so, how much better is it?

Figure 3 shows a comparison among three different institutions using three different donut charts.

03_3Donuts

Figure 3  — Three donut charts displaying placement percentages for three different institutions.

Before digging deeper let’s replace the three separate donuts with a donut-within-a-donut-within-a donut chart (Figure 4.)

Figure 4  -- A concentric donut chart (also called a “radial bar chart” or a “pie gauge.”)

Figure 4  — A concentric donut chart (also called a “radial bar chart” or a “pie gauge.”)

“What’s the problem?” you may ask, “these comparisons are easy.”

While you may be able to make the comparisons you are in fact working consierably harder than you need to be.

Really.  Let me prove it to you.

Let’s suppose you wanted to compare the heights of three famous buildings: One World Trade Center, The Empire State Building, and The Chrysler Building (Figure 5).

Figure 5  -- Comparing the size (in feet) of three large buildings.

Figure 5  — Comparing the size (in feet) of three large buildings.

Now that’s an easy comparison. With virtually no effort we can see that One World Trade Center (blue) is almost twice as tall as The Chrysler Building (red).

Now let’s see how easy the comparison is with donuts (Figure 6.)

Figure 6  -- Three large buildings twisted into semi-circles.

Figure 6  — Three large buildings twisted into semi-circles.

Here are the same buildings rendered using a concentric donut chart (Figure 7).

Figure 7  -- Three skyscrapers spooning.

Figure 7  — Three skyscrapers spooning.

Yikes.

So, with this somewhat contrived but hopefully memorable  example we took something that was simple to compare (the silhouettes of buildings) and contorted them into difficult-to-compare semi-circles.

With this in mind, let’s revisit the Placement example we saw in Figure 3.

Here is the same data rendered using a bar chart.

Figure 8 -- Placement percentage comparison using a bar chart.

Figure 8 — Placement percentage comparison using a bar chart.

The comparison is much easier with the bars than with the donuts / semi-circles. You can tell with practically no effort that the blue bar is approximately twice as long as the red bar, even without looking at the numbers.

Indeed, that’s a really good test of how clear your visualization is: can you compare magnitude if the numbers are hidden?

Pop quiz — how much larger is the orange segment compared to the red segment?

Figure 9 -- Trying to compare the length of donut segments is difficult.

Figure 9 — Trying to compare the length of donut segments is difficult.

Now try to answer the same question with a “boring” bar chart.

Figure 10 -- Comparing the length of bars is easy.

Figure 10 — Comparing the length of bars is easy.

With the circle segments you are squinting and guessing while with the bars you know immediately: the orange bar is twice as large as the red bar.

More downsides for donuts

In addition to comparisons being difficult, how would you handle a situation where you exceeded a goal?  For example, how do you show a salesperson beating his / her quota?  With a bar chart you can show the bar going beyond the goal line (Figure 11).

Figure 11 -- With a bar chart it's easy to show more than 100% of goal.

Figure 11 — With a bar chart it’s easy to show more than 100% of goal.

How do you show this with a donut chart?

Rhetorical question.  You can’t.

Conclusion

If you only have to show progress towards a single goal and don’t need to make a comparison then it’s fine to use a donut chart. If you need anything more complex you should use a bar chart as it will be much easier for you and your users to understand the data.

Special thanks to Eric Kim for creating the building images.

 

Jun 222016
 

Overview

My obsession with finding the best way to visualize data will often infiltrate my dreams. In my slumbers I find myself dragging Tableau pills in an ongoing pursuit to come up with the ideal dashboard that shines light on whatever data set has invaded my psyche.

But is the pursuit of the perfect dashboard folly?

Probably, as I’ll explain in a minute, but I don’t want to suggest anyone not at least try for the clearest, most insightful and most enlightening way to display information.

Is this way is the best way?

This pursuit of the ideal chart preoccupies a lot of people in the data visualization community. Consider this open discussion between Stephen Few and Cole Nussbaumer Knafflic that transpired earlier this year.

As you will read, Few weighs in on Knaflic’s book Storytelling with Data and her use of 100% stacked bar charts.  He cited this particular example.

Figure 1 -- Knafflic's 100% stacked bar

Figure 1 — Knafflic’s 100% stacked bar

Few argued that there was a better approach and that would be to have a line chart with a separate line for each goal state.

Figure 2 -- Few's line chart

Figure 2 — Few’s line chart

Having written about visualizing sentiment and proclivities, I chimed in suggesting that a divergent stacked bar chart would be better (see Figure 3.) I think this presents a clearer and more flexible approach, especially if you have more than three categories to compare as the 100% stacked bar chart and line chart can become difficult to read.

Figure 3 -- My divergent stacked bar chart

Figure 3 — My divergent stacked bar chart

The ongoing public discussion was engaging and congenial but I’ve seen similar cases where one or more of the parties advocating a solution become so certain that his / her approach is without a shadow of a doubt the only right way to present the data that tempers flare high. Indeed, I’ve seen instances where some well-respected authors have declared a type of “Sharia Law” of data visualization and have banned so-called heretics and dilettantes from leaving comments on blogs and even following on Twitter!

My take? While I prefer the divergent stacked bar, the real question is whether the intended audience can see and understand the data. In this case, if management cannot tell from any of the three charts that there was a problem that started in Q3 2014 and continued for each quarter, then that company has some serious issues.

In other words, if the people that need to “get” it can in fact make comparisons, see what is important, and make good decisions on their new-found understanding of the data  — all without having to work unnecessarily hard to decode the chart — then you have succeeded.

I’m not saying don’t strive to be as efficient , clear, and engaging as possible, it’s just that the goal shouldn’t be to make the perfect chart; it should be to inform and enlighten.

And in this case I think all three approaches will more than suffice.  So stop arguing.

Understanding and educating your audience

Earlier this year I got a big kick out of something that Alberto Cairo retweeted:

Figure 4 -- Avoid Xenographphobia: The fear of unusual graphics / foreign chart types.

Figure 4 — Avoid Xenographphobia: The fear of unusual graphics / foreign chart types.

Xenographphobia! What a wonderful neologism meaning “fear of unusual graphics.”

So, why do I bring this up? While it’s critical to know your audience and not overwhelm them with unnecessary complexity, you should not be afraid to educate them as well. I’ve heard far too often people proclaim “oh, our executive team will never understand that chart.”

Really? Is the chart so complex or the executive so close-minded that they won’t invest a little bit of time getting up to speed with an approach that may be new, but very worthwhile?

I remember the first time I saw a bullet chart (a Stephen Few creation) and thought “what is this nonsense?”  It turns out it wasn’t, and isn’t, nonsense.  It took all of 60 seconds for somebody to explain how the chart worked and I immediately saw how valuable it was.

Figure 5 -- A bullet chart, explained.

Figure 5 — A bullet chart, explained.

I had a similar reaction when I first heard about jump plots from Tom VanBuskirk and Chris DeMartini. My thoughts at the time were “oooh… curvy lines.  I love curvy lines! But I suspect this is a case where the chart is too much decoration and not enough information. I bet there are better, simpler ways to present the data.”

Figure 6 -- Jump plot example. Yes, these are very decorative, but they are also wickedly informative.

Figure 6 — Jump plot example. Yes, these are very decorative, but they are also wickedly informative.

Then I spent some time looking into the use cases and came to the conclusion that for those particular situations jump plots and jump lines worked really well.

That said, there are some novel charts that I don’t think I will ever endorse, with the pie gauge being at the top of my list.

Figure 7 -- The pie gauge, aka, a donut chart within a donut chart, aka, stacked donut chart. I won't go into the use case here but a bullet chart is a much better choice.

Figure 7 — The pie gauge, aka, a donut chart within a donut chart, aka, stacked donut chart. I won’t go into the use case here but a bullet chart is a much better choice.

So, what should we do?

I’ve argued that you should always try to make it as easy as possible for people to understand the data but you should not go crazy trying to make the “perfect dashboard.”

I also argue that that while you should understand the skillset and mindset of your audience, you should not be afraid to educate them on new chart types, especially if it’s a “learn once, use over and over” type of situation.

But what about aesthetics, engagement, and interactivity? What roles do these play?  Is there a set of guidelines or framework we should follow in crafting visualizations?

Alberto Cairo, in his book The Truthful Art, suggests such a framework based on five key qualities.

I plan to write about these qualities (and the book) soon.

Apr 112016
 

Overview

This past week I enjoyed looking at and interacting with Matt Chambers’ car color popularity bump chart.

 Figure 1 -- Matt Chambers' car color popularity bump chart.

Figure 1 — Matt Chambers’ car color popularity bump chart.  You can find the original Datagraver visualization upon which this was based here.

The key to this dashboard is interactivity as it’s hard to parse all the car colors at once. If you hover over one at a time it’s easy follow the trends, as shown here.

Figure 2 -- Hovering over a color shows you that color’s ranking over time

Figure 2 — Hovering over a color shows you that color’s ranking over time

Showing Rank Only

Over the past few months I’ve seen a lot of people making bump charts (myself included). As much as I like them I fear that people are leaving some critical insights out of the discussion as bump charts only show ordinal information and not cardinal information. That is, they show rank but not magnitude.

Consider the bump chart above.  In 2009 White was the number one color, Black was number two, and Red was a distant sixth.

Figure 3 -- Red appears to be a distant sixth

Figure 3 — Red appears to be a distant sixth

But was Red in fact “distant” or its popularity closer than it would appear?  When you just show rank there’s no easy way to tell.

Showing Rank and Magnitude

Consider the dashboard below that shows the overall ranking and percentage popularity for car colors over the last ten years.

Figure 4 -- Ranked Bar Chart dashboard with no colors selected

Figure 4 — Ranked Bar Chart dashboard with no colors selected

Right now we can see that over the last ten years white came in first place with 22% and Red came in fifth place with 11%.  Now let’s see what happens if we select red and white, as shown below.

Figure 5 -- Comparing popularity of white and red car over the last ten years.

Figure 5 — Comparing popularity of white and red car over the last ten years.

Here we can see everything that the bump chart had plus so much more. Specifically, we can see that White was in first place for the past ten years and that Red was as high as fourth place in 2007 and as low as sixth place in 2008 and 2009. But we can also see that in 2009 White was only 50% larger than Red while in 2015 it was almost 150% larger!

Try it yourself

Click here to interact with the color popularity ranked bar chart.

Ranked Bars are Versatile

The ranked bar approach works well showing rank and magnitude over time and across different categories.

Consider the dashboard below that shows the sales for the top 20 products overall and then a ranked breakdown by one of three possible categories (Customer Segment, Region, and Year)

Figure 6 -- Overall sales / rank and sales / rank broken down by Customer Segment.

Figure 6 — Overall sales / rank and sales / rank broken down by Customer Segment.

Here we can see not only how the Bosch Full Integrated Dishwasher is ranked overall and within the four Customer Segments, but we can also see how much more and less the other products’ sales were.

Here’s the same dashboard showing a breakdown by Region.

Figure 7 -- Overall sales / rank and sales / rank broken down by Region.

Figure 7 — Overall sales / rank and sales / rank broken down by Region.

The Bosch Dishwasher is fifth overall but it isn’t even in the Top 20 in the South.  We can also see that it is Second in the East, ever-so-slightly behind the first ranked product (the Whirlpool Upright Freezer.  You can see for yourself when you interact with the dashboard that’s at the end of the post).

Here’s the same data but presented using a bump chart.

Figure 8 -- Overall sales / rank and just rank by Region.

Figure 8 — Overall sales / rank and just rank by Region.

The bump chart looks cool but we only get part of the story as I can only glean rank.

Conclusion

The bump chart is a great choice if you want to show “soft” rankings, such as what place a team came in over time, but if you want to show rank and magnitude, consider the ranked bar chart instead.

Note: for step-by-step instructions on how to build a dashboard like the one below, see Visual Ranking within a Category.

The Ranked Bar Dashboard — Kick The Tires

Mar 302016
 

Some thoughts on functionality, beauty, crown molding, and lollipop charts

Overview

I’ve been writing a book about business dashboards with Jeffrey Shaffer and Andy Cotgreave and we’ve conducted screen-sharing sessions with dozens of people and reviewed scores of dashboards. We had a particularly enjoyable jam session with Tableau Zen Master Mark Jackson last week. When we asked him why he had done something in particular he replied with a comment that has been haunting me (in a good way) ever since:

“I look at this dashboard first thing every morning. I want to look at something beautiful.”

This really resonated with me. Mark was not tasked with making a public-facing dashboard that had to compete with USA Today infographics. He just wanted to make something that was both functional and beautiful. It made me think of waking up in a lovely room with crown molding. You don’t need crown molding, but as long as it isn’t blocking sunlight or clashing with the decor it’s certainly delightful to have crown molding.

This got me thinking about a topic I come back to often — how to make visualizations that are both functional and beautiful.

Unfortunately, this isn’t so easy and often leads to people sacrificing clarity for the sake of coolitude (see “Balancing Accuracy, Engagement, and Tone” and “It’s Your Data, not the Viz, That’s Boring” for some more thoughts on the matter).  I did, however, want to share a case study that had a delightful outcome and that employed a chart type that combines the accuracy of a bar chart with a bit of the “oooh” from packed bubbles and “ahhh” from donut charts.

Marist Poll and Views of the 2016 Presidential Election

Marist Poll is one of my clients and they are tasked with providing nationwide survey results to The Wall Street Journal and NBC News.  In November 2015 they conducted a poll asking people to describe in one word the tone of the 2016 presidential election. Here were the results.

Figure 1 -- Marist Poll results in tabular form

Figure 1 — Marist Poll results in tabular form

Attempt One — Word Cloud

The results from the poll are very compelling but the results as depicted in the text table don’t exactly pop.

The client tried a word cloud as shown below.

Figure 2 -- Marist Poll results using a word cloud

Figure 2 — Marist Poll results using a word cloud

I’ll admit that the graphic “pops” but it’s hard to make sense of the six terms let alone discern that the results for “Crazy” were almost three times greater than the next most popular term.

Attempt Two — Packed Bubbles

People love circles and this chart certainly looks “cool” but what does it tell other than that the “Crazy” circle is larger than the other circles?

Figure 3 -- Marist Poll results with packed bubbles

Figure 3 — Marist Poll results using packed bubbles

Why not use a simple bar chart?

Attempt Three — A Simple Bar Chart

Here are the same results rendered using the chart type Tableau’s “Show Me” suggests you use when working with this type of data.

Figure 4 -- Marist Poll results using a bar chart

Figure 4 — Marist Poll results using a bar chart

This is a big improvement over the word cloud and packed bubbles with respect to clarity — you can easily sort the responses and see how much larger “Crazy” is than the other responses.

But the chart is a bit sterile. What can we do to make the “Crazy” pop out without distorting the information?

Attempt Four — A Colored Bar Chart

The major takeaway from the poll is that 40% of the respondents characterized the election as “Crazy.” We can make that easier to glean by making that bar a bold color and all the other bars muted, as shown here.

Figure 5 -- Marist Poll results using a bar chart with one bar colored differently

Figure 5 — Marist Poll results using a bar chart with one bar colored differently

I’ll confess that this does the trick for me, but the client wanted to pursue some other options so we looked into a lollipop chart.

Attempt Five — Lollipop Chart

The lollipop chart is not native to Tableau;  it’s simply a dual axis chart that superimposes a circle chart on top of a bar chart that has very thin bars.

Figure 6 -- Marist Poll results as a lollipop chart

Figure 6 — Marist Poll results using a lollipop chart

This strikes me as an excellent compromise between the analytical integrity of the bar chart and the “ooh… circles” appeal of the packed bubbles.  I have no qualms about using this chart type.

But there’s still something if we want the chart to have some impact.

Final Attempt — Adding a Compelling Title

A concise, descriptive title can make a huge difference in garnering attention and making a chart more memorable. In the example below the client added some graphic design artistry to the typography to make the title compelling.

Figure 7 -- Marist Poll results as a lollipop chart with compelling headline.  I love this.

Figure 7 — Marist Poll results using a lollipop chart with compelling headline.  I love this.

Conclusion

My bass-playing friends will probably agree that “groove” is more important than “chops.”  That is, being able to play “in the pocket” with a rock-steady beat is more important than being able to play a great solo with a flurry of notes all over the neck.

But it sure is great to be able to do both.

The same goes for data visualization. Functionality needs to come first, then beauty.

But it sure is great to have both.

And in many cases, with a little extra effort, you can have both.

So go ahead, try putting some “crown molding” into your data visualizations and delight yourself and your stakeholders.

 

Nov 102015
 

Overview

Several weeks ago the data visualization community broke into justified outrage over an inexcusably misleading dual-axis chart from Americans United for Life.  I plan to write an article about this and other “ethically wrong” visualizations in a few weeks but in the meantime I encourage you to read these excellent posts from Alberto Cairo and Emily Schuch, as well as this discussion from Politifact.

Around the same time these posts appeared I came across a “Viz of the Day” dashboard from Emily Le Coz that accompanied a lengthy article in the Daytona Beach News-Journal.  The dashboard contained several visualizations but the one that caught my eye was this dual axis chart.

Figure 1 -- Infographic showing that as the number of firefighters has increased over the past 30 years, the number of fire-related deaths has decreased.

Figure 1 — Infographic showing that as the number of firefighters has increased over the past 30 years, the number of fire-related deaths has decreased.

I engaged in an interesting Twitter discussion about this graphic with Alberto Cairo, Jorge Camoes, and Noah Illinsky. I’ll get into that discussion in a bit (and point out some troubling problems with the visualization) but first want to discuss the use case for dual axis charts.

Why use dual axis charts

There are several reasons to use a dual axis chart (e.g., a Pareto chart that shows individual values along with the cumulative percent) but the primary use case is when you want to compare two completely different measures and see if there is any noteworthy relationship between the two measures.  Consider the example below that shows cyclical sales data for a retail store (bars) and the number of orders placed each month (line).

Figure 2 -- Dual axis chart comparing sales and orders by month.

Figure 2 — Dual axis chart comparing sales and orders by month.

The surprising result is that while November is historically the strongest month for sales ($5M from 2010 to 2013) the total number of orders placed in November is the lowest of any month. And yes, I checked to make sure that this was true of all years and not one crazy blowout year.

I think this dual axis combination chart (where we show bars and a line) makes it easy to see there is something very interesting about November. The low number of orders combined with the high sales – something that is easy to see – means that we either sold more items per order or more expensive items per order.

So, what’s wrong with the firefighter example?

Given that dual axis charts can be so useful I wondered why I had problems with the Firefighter example.  Fortunately, the author made the dashboard downloadable from Tableau public so I was able to see how it was put together.

Cutesy icons set the wrong tone for the piece

My first problem was with the firefighter hat and skull-and-crossbones icons.

Figure 3 -- Icons representing firefighters and civilian deaths.

Figure 3 — Icons representing firefighters and civilian deaths.

In my opinion (and it is just an opinion) I thought this “cartoonified” the visualization. I would much prefer to see either a simple color legend or a label next to both lines.

The author exaggerates the changes over time

A much more troubling issue is that the author uses a fixed Y-axis that exaggerates the changes over time.  The author also fails to show the axis labels so we can’t see that the axis doesn’t start at zero.

Consider the dashboard below that shows the original visualization on the left with an accurate visualization on the right.

Figure 4 -- Comparison of fixed axis vs. automatic axis charts.  Note that the axis uses a SUM() function while the label is using AVERAGE(). The data is repeated three times in the data source which is why the author needs to use AVERAGE(). Yes, the axis should use AVERAGE() as well but the relative positioning of the elements is the same with SUM() so this causes no harm.

Figure 4 — Comparison of fixed axis vs. automatic axis charts.  Note that the axis uses a SUM() function while the label is using AVERAGE(). The data is repeated three times in the data source which is why the author needs to use AVERAGE(). Yes, the axis should use AVERAGE() as well but the relative positioning of the elements is the same with SUM() so this causes no harm.

Because the author fixed the Y-axis rather than starting from zero, the slope of the lines is exaggerated. While this does not alter what is in fact a noteworthy observation, whenever I see this type of “rigging” it makes me question the validity of any and all parts of the story.  That is, even though I don’t think the exaggeration was an intentional attempt to dramatize the difference, seeing this in play will make me question everything that the author and the publication now publishes.

Am I being too hard on the author? I don’t think so as anything that’s published as a “viz of the day” and accompanies a high-profile news article should get a lot more scrutiny than just any old Tableau Public visualization.  While I don’t feel mislead by the overstated changes, I do wonder at what point does a viz cross the line into TURD territory (Truly Unfortunate Representation of Data)? We’ll save that discussion for a later post.

Different approaches

Combination area and line chart

After adjusting the axis I still wondered if having two line charts was causing unnecessary confusion. In my first makeover attempt I tried combining an area graph with a line chart, as shown here.

Figure 5 -- First makeover attempt.  A dual axis chart using an area chart for firefighters and a line chart for civilian deaths.

Figure 5 — First makeover attempt.  A dual axis chart using an area chart for firefighters and a line chart for civilian deaths.

While using two different chart types made it easier to see that I was comparing two different measures, I didn’t love the chart and sought alternatives.

Connected Scatterplots

On Twitter Jorge Camoes offered this connected scatterplot.

Figure 6 -- Jorge Camoes’ connected scatterplot.  Notice that the axes do not start at zero but that the axes labels are at least visible.

Figure 6 — Jorge Camoes’ connected scatterplot.  Notice that the axes do not start at zero but that the axes labels are at least visible.

In a connected scatterplot the path the line takes represents the year.  This is why the line folds back on itself from time to time (more on this in a moment).  Camoes also “normalized” the data using an index so that both civilian deaths and number of firefighters start at a value of 100.

I like this visualization very much but fear that many people won’t understand the index value of 100 so I tried my own connected scatterplot, shown below.

Figure 7 -- Connected scatterplot with regular vs. normalized values.  Notice that the X-axis does not start at zero but that the axes labels are visible.

Figure 7 — Connected scatterplot with regular vs. normalized values.  Notice that the X-axis does not start at zero but that the axes labels are visible.

Before anyone cries foul about the X-axis, here’s a version with the axis starting at zero.

Figure 8 -- Connected scatterplot with both axes starting at zero.  This may be why Camoes normalized the data although his chart doesn’t start at zero, either.

Figure 8 — Connected scatterplot with both axes starting at zero.  This may be why Camoes normalized the data although his chart doesn’t start at zero, either.

I think starting the x-axis at zero obscures the relationship but that’s not what makes me question using this approach.  My problem is that many people will have a hard time understanding how the line “works”, as it were.  This is because whenever we see a line chart that involves time we come to expect marks on the left of the chart to show older dates and marks on the right to show newer dates.  In other words, we expect the chart to behave like this.

Figure 9 – Since grade school we’ve been indoctrinated to expect earlier dates to the left and later dates to the right.

Figure 9 – Since grade school we’ve been indoctrinated to expect earlier dates to the left and later dates to the right.

With a connected scatterplot the X-axis is “owned” by an independent measure so we have to adjust our perception to see that sometimes a later year will appear to the left of an earlier year, as shown below.

Figure 10 -- Connected scatterplot with marks showing all years.

Figure 10 — Connected scatterplot with marks showing all years.

Notice how 1986 appears to the left of 1985 and 1989 appears to the left of 1988.  Unless you are used to this type of approach this can look very strange.

Keep it simple

After experimenting a bit more I decided to forgo the dual axis and connected scatterplots and fashioned this simpler narrative.

Figure 11 -- Two separate charts yielding a simple and easy-to-follow narrative.

Figure 11 — Two separate charts yielding a simple and easy-to-follow narrative.

If you have what you think is a better approach I would love to see it.  If you’re using Tableau you can download the packaged workbook with the original dashboard and various makeover attempts here.