Apr 302013
 

Overview

I was stuck earlier this month trying to cajole Tableau into doing something I needed it to do so I contacted my friend Joe Mako.  When it comes to Tableau, Joe is the “guru’s guru.” (Joe was the person that showed me how to create filled maps in Tableau before Tableau had native support for this. See http://www.datarevelations.com/tracking-stds-hiv-and-aids-in-texas.)

Joe did in fact have a very slick solution to my problem and I will probably write about in a future post but I would rather focus on a broader issue that came about when Joe commented on a visualization I had on my screen.

Vertical Scatterplot

Consider the image below which readers may recall from a post I did awhile back on getting people to care about your viz.

The size of the circle corresponds to number of respondents reporting a salary close to the amount shown.

The red circle shows “your” salary and the other circles show the salaries of everyone else that responded to the survey. The size of the circle indicates the number of respondents that reported earning a particular salary.

There are a number of problems that I had with this approach, the biggest being that I had to group / bin salary amounts so that similar amounts would yield bigger circles.  That is, I would run into troubles with salary amounts like these…

$50,150

$50,200

$49,750

… as they would yield three separate small circles instead of one larger circle.

Here’s what the visualization looks like if you plot a circle for each response ID and don’t size the circles based on number of occurences of a particular salary value.

A vertical scatterplot with too many dots.

Our problem is that we cannot glean the clustering as we have so many marks that are stacked in a single column.

Cue the Violins

Joe, who in addition to Tableau expertise is a font of generalized visualization knowledge, asked if I had ever heard of a violin plot (I had not). He then pointed me to this blog post.

In addition to the violin plot, the post discussed “jittering” marks so that you spread dots both horizontally and vertically, like this:

“Jittering” the scatterplot

Joe pointed out that producing this jitter effect was very simple in Tableau.  You just need to create an x-y chart where the y-axis contains the salary for each respondent and the x-axis displays the index value (the row number) for the particular response.  Interestingly, it is because there is no relationship between the response ID and the salary value that the INDEX() function essentially randomizes the responses and scatters marks across the x-axis.  If you were to sort the IDs by salary you would get an interesting chart, but one that makes the clustering harder to see.

You say potato and I say “Pareto”

Creating the Visualization

The screen shot below shows the main components that go into the visualization.  We have placed the INDEX() function on the Columns shelf and AVG(Salary) on the Rows shelf (note that it will work fine with SUM or even without an aggregation).

Notice that we are coloring by Gender and that ID is on the Level of Detail

Note that ID is on the level of detail.  This is what produces a separate circle for each salary respondent.  We also Compute by ID in the Index() table calculation.

Compute using ID

The only thing left to do is resize the visualization so that it is very narrow.

Here’s the Tableau dashboard showing both the “jittered” scatterplot and your salary as a separate dot.

(I will leave it as an exercise for the reader to download and see how to display the dot.)

[suffusion-the-author]

[suffusion-the-author display='description']

  17 Responses to “I’ve Got the Jitters (and I Like it!)”

Comments (12) Pingbacks (5)
  1. Good stuff! Here’s another good one that allows for user-configurable spacing: http://community.tableausoftware.com/message/135321#135321 

  2. Nice work Steve.

  3. Very cool, thanks for the quick walk through. I think it should be noted, however, that Index() will not randomize your rows, it just returns the row number of the data set however that’s ordered. If your data set is pre-sorted by a field that’s either in or dependent on a field in your viz, you will need to create a random variable in the data set to achieve this view. Tableau doesn’t support random number generation, so you can’t do it with a calculated field. You’ll need to go into the data source and created a column with a random number for each row and then refresh the connection.

    • Cameron,
      You are 100% correct that we are counting on the ResponseID being random as INDEX() itself doesn’t scatter the marks randomly. That said, I have yet to see a case where ResponseID didn’t work for this type of thing.
      Steve

      • I think it’s mostly (but infrequently, like you said) cases where the data source has been preprocessed for some reason. If I am connecting to an excel workbook, for example, and the client that sent it to me has presorted the worksheet I’m looking at by salary, then it may cause a problem. I just wanted to point out to other readers that they would need to go into the data source itself in order to make the correction.

        Similarly, for data generated and therefor pre-sorted by time (i.e. log files), any plot with a time-related field on the y-axis will have a dependency with the index() and won’t give you a random distribution.

      • Here’s a good one – plotting total hours worked by index based on employeeID. Should work fine, no? No, a tail is created because (of course) employeeIDs appear later in the data for new employees, so those hired halfway through the data period will have a lower total hours and appear to the right side… 

        One way I’ve dealt with this is to use a mod function on the index that provides a suitable spread for the width of the data: index()%100 for example

        While nowhere near as comprehensive or configurable as the example on Russell’s link, it’s a little simpler.

    • What about using R integration to get some random numbers? 

      • Damien, I don’t see why in this case (and most cases) we would need to introduce a randomization element. I can see problems if for some reason the data has an inherent sort to it, but I don’t think I’ve ever run into that.

        (Famous last words… I’ll probably run into it next week).

        • I agree in this case it’s useless.

          Actually, I slept on it and I couldn’t see either why we would need random numbers directly in Tableau.
          Randomization might mostly be used in some kind of simulations, but it’s likely it would be integrated in some R functions giving only the result we want to visualize.

          I’ve also been looking in Tableau Ideas section and found that some people would like a random function for two purposes (I cite):
          – Filter random sample data when working on a Viz
          – Generate jitter for placing points on maps in a visible way when multiple data points are geocoded to the same location

          I’m not sure why you would want the second one on geocoded data, but I imagine that we might need to jitter data on a 2-axis scatterplot. In that case, I don’t see how we could directly apply the index() technique. 

  4. Alex, using MOD is very clever.  The bottom line is that my examples work because the IDs happen to be reasonably random.  I like using MOD as a way to address this if the IDs aren’t random and if you can’t go back and add a new column to the data.

Leave a Reply to Russell Christopher Cancel reply

(required)

(required)