The catalyst for this post comes from my recent attendance of a Tableau user group where the presenter demonstrated a dashboard that featured a packed bubble chart. I spent a lot of time shaking my head – not because this was a very poor visualization choice — but because the presenter was in a position of authority and there were people in attendance that were new to Tableau and to data visualization. These people would likely come away from the presentation thinking that they should, when presented with similar data, use a packed bubble chart.
I then recalled something that I had written previously:
If I see a visualization that is poorly designed or worse, misleading, I’m going to say something about it. I hope you will do the same.
The culprit visualization
I do not have the data that drove the Tableau user group visualization so I will use Superstore Sales data to illustrate my point.
For whatever reason, the presenter eschewed creating a clear and simple bar chart, like this one…
Figure 1 — A simple but abundantly clear bar chart.
… and instead built a difficult-to-interpret packed bubble chart that looked like this:
Figure 2 – A cool, but analytically-bereft packed bubble chart.
With the packed bubbles I have to work to determine which bubbles belong in which category and I have to work especially hard to determine how much larger a particular bubble is than another bubble. In addition, in some cases the bubble is too small for the supporting sub-category and measure labels.
A good rule of thumb – Ask yourself these three questions
As I considered the flaws in this chart type I began to codify some simple principals that I use when building visualizations. Specifically, before I go live with a visualization, I ask these three questions:
- Do I need different colors?
- Do I need a legend?
- Do I need measure labels?
In the case of the bar chart I don’t need to use color, I don’t need a legend and I don’t need to show the numbers next to the bars. I might want to show the numbers, but I don’t need to show them. With the packed bubble chart I need all three items in order to make sense of the viz.
I’m not saying that you should never use color, legends, labels, or circles; I just suggest that you ask yourself if there’s a way to build a clear visualization that doesn’t need one or more of these elements as the more of these elements you need the harder your audience will need to work.
Let’ see how this triumvirate of questions expose some of the flaws in pie charts, circle charts, 100% stacked bar chart, and “snakey” diagrams.
The problem with pies
Many people have written articles about this, my favorite being Stephen Few’s white paper on the subject. Indeed, if you need ammunition to move your organization away from pie charts I encourage you to download Few’s paper.
I’ll present an abbreviated discussion of the problems with pies to show how it underscores the utility of the three questions.
Consider the chart below which shows poll results to the question “what is your favorite beverage”?
Figure 3 — Simple pie chart showing poll results.
I can see that Chateau Lafite Rothschild comes in first, but I can’t tell if it’s Coffee or Dogfish IPA that comes in second, and I really can’t tell how much larger one segment is than another.
Here’s an alternative pie chart that adds color, a legend, and measure labels.
Figure 4 — Pie chart with added stuff so you can make sense of the pie chart.
Well, I can now determine the ranking and relative magnitude, but I have to spend a lot of time going back and forth between the legend and the chart. Cosnider how much simpler it is to understand the poll results using a bar chart:
Figure 5 — Poll results displayed in a bar chart.
So, just why is the pie chart harder to understand? In addition to requiring a legend, it also has to do with people’s inability to compare the area of circles.
The Problem with Circles
As with pies, Stephen Few has written about this subject, as has Alberto Cairo in his book The Functional Art. (Do you own a of Cairo’s book? If the answer is “no” you should buy it now. Really. Stop reading this and buy it).
Now that you’ve bought the book…
Consider the collection of bar charts below. Two of the groups have measure values that are labeled incorrectly while one of the groups is correctly labeled.
Figure 6 — One of the groups is labeled correctly and two are mislabeled. Can you tell which one is correct?
Can you tell which one of the three groups is labeled correctly?
Now have a look at the same data presented with a packed bubble chart where again one group is correctly labeled and the other two are not.
Figure 7 — One of the groups is labeled correctly and two are mislabeled. Can you tell which one is correct?
If you are like most people you’ll solve the bar chart example very quickly and probably won’t have a clue with the packed bubble charts.
Note – The answers may be found at the end of this blog post.
Incidentally, the differences are pretty significant, but I could magnify the errors quite a bit in the circle charts and people still wouldn’t be able to tell which group was correct as people are just horrible with comparing the area of differently-sized circles.
There are certainly places where circles are useful and most welcome, but they don’t work well here and they don’t work well in the example I discuss below.
Substituting a bad chart type with another bad chart type
I’ve recently read a collection of blog posts where the author suggests ways in which people can avoid the stranglehold of pie charts by using other chart types. I liked the promise of this blog series and was pleased to see that the first example presented a bar chart similar to the one show in Figure 1. I was however, surprised at some of the other approaches as I did not think they presented data clearly.
One of the questionable alternatives was a panel chart like the one shown below.
Figure 8 — A panel chart comprising circles that makes me have to work harder than I would like.
I have to work very hard to “grok” this viz and that’s because I cannot make sense of the data without reading and interpreting the measure labels. In addition, because the items were not grouped I had to refer to the color legend to see which circles represented Technology products, which were for Office Supplies, etc.
I grant that there are cases where you may want to present the product sub categories from largest to smallest without grouping them into a hierarchy, but I still maintain that it’s much easier to interpret the data with a bar chart like the one shown below, which does not require measure labels.
Figure 9 — Bar chart with hierarchy removed. We need a legend but don’t need measure labels.
Note: I am not saying that you should not use measure labels; I am saying that if the visualization requires measure labels then there is a good chance you’ll be able to craft a better visualization.
Does this mean you should never use circles?
There are of course myriad instances where circles would be very welcome. Consider the following map that shows the number of orders by location.
Figure 10 — Symbol map
I can see very easily that the number of orders on the West Coast (Washington, Oregon, and California) is considerably larger than the rest of the country. In this case it’s seeing the circles on top of a map that helps me conclude that there’s a lot of activity happening in one area of the country. If I wanted to know just how much activity, and if I wanted to be able to make quantitative comparisons, I would need an accompanying chart that helped me sort and determine the relative magnitude of orders for each state. That is, if I needed to know more than “whoa, look at the number of orders on the West Coast!” then I would probably craft a dashboard that would also contain a bar chart showing orders by state in descending order.
100% Stacked Bar Charts
I try to avoid 100% stacked bar charts as they absolutely require that I use color and they can be somewhat difficult to interpret without measure labels. Consider the visualization below that compares % of total shipping costs by product category, broken down by ship mode.
Figure 11 – A collection of 100% stacked bar charts. I think of this as being a “cubist” pie chart
It’s easy pretty easy to determine the Regular Air values as the axis starts at zero. It’s a bit harder to glean the Express Air and Delivery Truck Values without displaying the mark labels.
Still, it ‘s an easier read than three pie charts.
Figure 12 — Trying to understand the breakdown of shipping costs by Ship Mode across categories using pie charts (yuck).
While I try to avoid 100% stacked bar charts, I am a very big fan of divergent stacked bar charts. Here’s an example from a recent blog post. While I do need a color legend I can get by without measure labels.
Figure 13 — Divergent stacked bar chart.
Stacked bar charts also play a supporting role in Sankey diagrams which we explore below.
Where “snakey” Sankey diagrams work
Consider this snippet from Jeffrey Shaffer’s winning entry in the Tableau Quantified Self visualization competition.
Figure 14 — Jeffrey Shaffer’s Sankey chart maps how one stacked bar chart maps to another stacked bar chart.
At the bell of the trumpet we find a stacked bar chart where we can hover over items to see to what they refer:
Figure 15 — Hovering over a bar shows info about the bar and shows how the item is mapped to a different set of measures.
From this action I can see that Shaffer performed at five weddings where he played music by Bach, Clarke, Mouret, Reiche, Vivaldi, and Purcell.
Within this context, this very creative chart works as it’s not essential that I know the exact details of Shaffer’s performances. Instead, I can explore this and other portions of what is a very fun and playful dashboard and get a sense of who Shaffer is and whether I’d like to hang out with him (and I would as I know for a fact that musician / data visualization consultants are among the most interesting people on the planet. You can look it up.)
Where Sankey diagrams don’t work
I first saw this type of chart in a visualization Shaffer published earlier this year where he took a stab at redesigning his utility company’s fuel usage bill. Here’s what his redesign looks like:
Figure 16 — Shaffer’s energy bill redesign.
The chart is very decorative, but does it help me understand energy expenditures? There’s a really big story sitting in the data but I don’t think this chart helps me see it. Indeed, I would argue that while the chart is pretty it in fact obfuscates what is the big story.
Consider this visualization of the same data.
Figure 17 — A redesign of the redesign.
So what the big story? Almost half of the total energy expenditures (44%) goes towards heating the home.
My reaction to the Sankey diagram is “cool!” My reaction to the stacked bar chart is “crap!”
In the Quantified Self dashboard “cool” is the desired reaction. With the fuel bill “crap” is better as it may lead to better decisions and behavior changes (e.g., replace the windows, add insulation, or wear a sweater).
The goal of good data visualization is to elucidate, not decorate. If your visualization requires color, legends, and measure labels you should at least consider an approach that does not ask your viewers to work hard to see and understand what is important in the data.
Bar Chart – Group 2 is labeled correctly
Packed Bubble Chart – Group 1 is labeled correctly.