July 22, 2018

Overview

After writing my blog post about why I still love the Jitterplot, I got some thoughtful feedback from Adam McCann, Jeffrey Shaffer, and the always provocative Daniel Zvinca.

Dan had written a wonderful article suggesting a Stingray plot as an alternative, and Jeff and Adam’s comments encouraged me to revisit the unit histogram featured in Chapter 3 of the Big Book of Dashboards.

A makeover using a Unit Histogram chart

As you may recall, the goal of the Jitterplot is to allow an individual to see his / her place in the salary “universe” as well as get a good sense of how many people fall into different categories and how the different salaries cluster. With these things in mind, here’s the makeover using a Unit Histogram (also known as a statistical dot plot).  Note that you can find an interactive version at the end of this post.

Figure 1 — Unit Histogram showing individual salary, quartiles, and distribution.

For comparison, here’s the Jitterplot version.

Figure 2 — Jitterplot showing individual salary and quartiles.

What’s the downsize of the Unit Histogram?

To create the equivalent of histogram bars, each of the dots must fall into one of several equally-spaced bins. The example in Figure 1 above has bins spaced every $2,500.  This means a dot representing a value of $60,375 will have the same vertical location as a dot representing $62,290; they will both get placed in the $60,000 bin. In the jitterplot none of the dots have to be displaced vertically.

How to get the bins and the reference bands

If you use Tableau’s built-in binning feature you won’t be able to add reference lines or reference bands, even if you convert the Tableau-created bins from discrete to continuous.

You can, however, roll your own continuous bin using a technique Joe Mako suggested in this forum post from 2013. Here’s a look under the hood.

Figure 3 — What drives the unit histogram.

How it works

Continuous Bin (1) is on the Columns shelf and creates the axis along the left side. It’s defined as

INT( [Value] / [Bin Size]) * [Bin Size]

Where [Value] is the salary of a respondent and [Bin Size] is a parameter controlling the size of the bin.

Notice that there’s also a discrete version of this on the Level of Detail (2). This will allow Tableau to divide the visualization into discrete junks. We’ll need this so we can pile the dots on top of each other. Note that I acknowledge that having a field called Continuous Bin (discrete) is an oxymoron.

Resp ID (3) is also on Level of Detail. This will force Tableau to draw a separate dot for each Resp ID.

INDEX() (4) is Tableau’s built-in Index function. It essentially answers the question “what row within the partition can I find this dot?”

The key thing is how we setup the addressing and partitioning so that Tableau will pile the dots within each bin. Here’s how Compute Using needs to be setup in the Tableau Calculation dialog box.

Figure 4 — Defining Compute Using.

The critical elements are highlighted in orange. By selecting Resp ID we’re telling Tableau “within the current partition (which is the discrete salary bin) determine what row a dot is in, then place the next dot one higher, then the next, until you are done with all the dots that are within this partition / bin.  When you get to a new partition / bin, start the process over again.”

Other approaches

Move the box plot / distribution bands out of the way

Jeff Shaffer provided feedback and one of his biggest gripes about box plots and quartile bands in general is that they obscure the marks underneath.  Here are two alternatives Jeff made that address this.

Figure 5 — Jeff Shaffer’s makeover where he places a box plot to the left of each Unit Histogram.

Figure 6 — Jeff Shaffer’s second makeover where the box plot covers half of the Unit Histogram.

I admit that I find it easier to explore the dots and see the distribution when the box plot does not occlude the unit histogram. My two problems with Jeff’s alternatives are that the charts take up more screen real estate and there’s a fair amount of extra work to build the visualizations (you can read about how to do that here.)

My compromise was to add a toggle that allows people to turn the distribution bands on and off at will. (See the dashboard at the end of this post.)

Why not a regular histogram with a dot?

Adam McCann sent me some alternative views one of which was like the chart shown below.

Figure 7 — Dot with simple histogram

Indeed, seeing this makes me think why bother to have all the little dots?  Why not just the “my salary” dot compared with a simple histogram?

There are a few reasons why I prefer both unit histogram (and the Jitterplot):

  • The standard histogram above lacks the visceral impact of the Unit Histogram and Jitterplot. I admit, that is a completely subjective stance.
  • The unit histogram allows me to inspect individual dots, as in “what’s associated with that dot way down there? I’d like to know more about that dot!”
  • We can resize the unit histogram dots based on some other measure (e.g., years with company.)

Note that none of this works if you have tens of thousands of dots. Indeed, at that point I would just use a histogram and show my audience which histogram pertains to them (see Are you over the hill in the USA for an example.)

Dan Zvinca’s KDE Piled Dot Plot and Stingray Plot

As I mentioned at the beginning of this article, one of the catalysts for me reconsidering the jitterplot was Dan’s article about the KDE Piled Dot Plots / Stingray Plots.

Figure 8 — KDE Piled Dot Plot encoding 2400 elements

Figure 9 — Vertical Stingray Plot with colors given by the categories and overall quartiles.

I like Dan’s approach a great deal in that it takes the smooth-curve distribution of a violin plot and fills it with granular details that can be highlighted and inspected. Dan is also applying strong statistical algorithms to determine the best curve versus the “pick a bin size and see if it works” approach in my dashboard.

One downside of the Dan’s approach is that like the Unit Histogram, the values must be altered so they fit within the curve (this is not the case with the Jitterplot.) The other downside is that I’ve not yet figured out how to build this in Tableau, and I don’t think it will be simple, at least for me. The Jitterplot and Unit Histograms are relatively easy to render in Tableau and do not require and special data preparation.

Conclusion

I’ve had success getting people to use dashboards by doing whatever I can to “insert” the audience into the dashboard itself. Until recently my favored approach was the Jitterplot, but I think I will at least explore using the Unit Histogram and, if it proves relatively easy to render in Tableau, the KDE Piled Dot Plot.

Note: when you hover over a dot you can see both its binned value and the actual value. The quartiles are derived based on binned values but one can modify this so that they are based on actual values. The difference is minor.