Several weeks ago the data visualization community broke into justified outrage over an inexcusably misleading dual-axis chart from Americans United for Life. I plan to write an article about this and other “ethically wrong” visualizations in a few weeks but in the meantime I encourage you to read these excellent posts from Alberto Cairo and Emily Schuch, as well as this discussion from Politifact.
Around the same time these posts appeared I came across a “Viz of the Day” dashboard from Emily Le Coz that accompanied a lengthy article in the Daytona Beach News-Journal. The dashboard contained several visualizations but the one that caught my eye was this dual axis chart.
I engaged in an interesting Twitter discussion about this graphic with Alberto Cairo, Jorge Camoes, and Noah Illinsky. I’ll get into that discussion in a bit (and point out some troubling problems with the visualization) but first want to discuss the use case for dual axis charts.
Why use dual axis charts
There are several reasons to use a dual axis chart (e.g., a Pareto chart that shows individual values along with the cumulative percent) but the primary use case is when you want to compare two completely different measures and see if there is any noteworthy relationship between the two measures. Consider the example below that shows cyclical sales data for a retail store (bars) and the number of orders placed each month (line).
The surprising result is that while November is historically the strongest month for sales ($5M from 2010 to 2013) the total number of orders placed in November is the lowest of any month. And yes, I checked to make sure that this was true of all years and not one crazy blowout year.
I think this dual axis combination chart (where we show bars and a line) makes it easy to see there is something very interesting about November. The low number of orders combined with the high sales – something that is easy to see – means that we either sold more items per order or more expensive items per order.
So, what’s wrong with the firefighter example?
Given that dual axis charts can be so useful I wondered why I had problems with the Firefighter example. Fortunately, the author made the dashboard downloadable from Tableau public so I was able to see how it was put together.
Cutesy icons set the wrong tone for the piece
My first problem was with the firefighter hat and skull-and-crossbones icons.
In my opinion (and it is just an opinion) I thought this “cartoonified” the visualization. I would much prefer to see either a simple color legend or a label next to both lines.
The author exaggerates the changes over time
A much more troubling issue is that the author uses a fixed Y-axis that exaggerates the changes over time. The author also fails to show the axis labels so we can’t see that the axis doesn’t start at zero.
Consider the dashboard below that shows the original visualization on the left with an accurate visualization on the right.
Because the author fixed the Y-axis rather than starting from zero, the slope of the lines is exaggerated. While this does not alter what is in fact a noteworthy observation, whenever I see this type of “rigging” it makes me question the validity of any and all parts of the story. That is, even though I don’t think the exaggeration was an intentional attempt to dramatize the difference, seeing this in play will make me question everything that the author and the publication now publishes.
Am I being too hard on the author? I don’t think so as anything that’s published as a “viz of the day” and accompanies a high-profile news article should get a lot more scrutiny than just any old Tableau Public visualization. While I don’t feel mislead by the overstated changes, I do wonder at what point does a viz cross the line into TURD territory (Truly Unfortunate Representation of Data)? We’ll save that discussion for a later post.
Combination area and line chart
After adjusting the axis I still wondered if having two line charts was causing unnecessary confusion. In my first makeover attempt I tried combining an area graph with a line chart, as shown here.
While using two different chart types made it easier to see that I was comparing two different measures, I didn’t love the chart and sought alternatives.
On Twitter Jorge Camoes offered this connected scatterplot.
In a connected scatterplot the path the line takes represents the year. This is why the line folds back on itself from time to time (more on this in a moment). Camoes also “normalized” the data using an index so that both civilian deaths and number of firefighters start at a value of 100.
I like this visualization very much but fear that many people won’t understand the index value of 100 so I tried my own connected scatterplot, shown below.
Before anyone cries foul about the X-axis, here’s a version with the axis starting at zero.
I think starting the x-axis at zero obscures the relationship but that’s not what makes me question using this approach. My problem is that many people will have a hard time understanding how the line “works”, as it were. This is because whenever we see a line chart that involves time we come to expect marks on the left of the chart to show older dates and marks on the right to show newer dates. In other words, we expect the chart to behave like this.
With a connected scatterplot the X-axis is “owned” by an independent measure so we have to adjust our perception to see that sometimes a later year will appear to the left of an earlier year, as shown below.
Notice how 1986 appears to the left of 1985 and 1989 appears to the left of 1988. Unless you are used to this type of approach this can look very strange.
Keep it simple
After experimenting a bit more I decided to forgo the dual axis and connected scatterplots and fashioned this simpler narrative.
If you have what you think is a better approach I would love to see it. If you’re using Tableau you can download the packaged workbook with the original dashboard and various makeover attempts here.