Sep 162014
 

Finally, a good use for packed bubbles!

The Problem

I recently received a query from a client on how to compare responses to one question with responses to another question when both questions have possible LIkert values of 1, 2, 3, 4, and 5.  That is, if you have a collection of questions like this:

01_LIkert

How would you show response clusters when you compare “Good Job Skills” against “Likes the Beatles”?

This question is particularly applicable if you are a provider of goods and services and you want to see if there is alignment or misalignment between “how important is this feature” and “how satisfied are you with this feature”.

Note: There’s a Tableau forum thread that has been looking into this issue as well.  Please see http://community.tableausoftware.com/thread/137719.

So, how can we fashion something that helps us understand the data?

Before we get into the nitty gritty here’s a screen shot of one of the approaches I favor.  Have a look to determine if reading the rest of the blog post is worth the effort.

02_PreviewResults

Still reading?  Well, I guess it’s worth the effort.

The Traditional Scatterplot Approach

Consider the set up below where we see how Tableau would present the Likert vs. Likert results in a standard scatterplot.

03_TradScatterPlot

So, what is going on here?

There are a total of nine Likert questions available from the X-Question and Y-Question parameter drop down list boxes.  Our desire here is to allow us to compare any two of the nine at any time.

The “meat” of the visualization comes from the SUM(X-Value) on the columns shelf and SUM(Y-Values) on the rows shelf where X-Value and Y-Value are both defined as

IF [Wording]=[X-Question] then [Value]+1 END

This translates into “if the selected item from the list is the same as one of the questions you want to analyze, use the [Value] for that question”. Note that [Wording] is the same as [QuestionID] but with human readable values (e.g., “Likes the Beatles” instead of “Q52”)

We use [Value]+1 is because the Likert values are set to go from 0 to 4 instead of 1 to 5, and most people expect 1 to 5.

We can use SUM(X-Value) and SUM(Y-Value) because we have Resp ID on the Details shelf.  This forces Tableau to draw a circle for every respondent.  The problem is that we have overlapping circles and even with transparency you don’t get a sense of where responses cluster. Yes, it is possible with a table calculation to change the size of the circle based on count but we’ll I’ll provide what I think is a better approach below.

A note about the filters: The Question filter is there to constrain our view so that we only concern ourselves with Likert Scale questions.  It isn’t necessary but is useful should we be experimenting with different approaches.  The SUM(X-Value) and SUM(Y-Value) filters remove nulls from the view.

Packed Bubbles to the Rescue

I’m not a big fan of packed bubbles (see this post) but for this situation we can use them and get some great results, as shown below.

04_BubbleScatterPlot

I’ve made a couple of changes to the traditional scatterplot visualization the most important being SUM(X-Value) and SUM(Y-Value) are now discrete and we get a trellised visualization instead of a continuous axis.  Note that I had to change the sort order of the Y-axis elements so that they appear in reverse order (5 down to 1).

I got the packed bubbles by placing CNTD(Resp ID) on the size button. This assures that each bubble is the same size and triggers Tableau’s packing algorithm.

Note that I also added an on-demand “Drill down” so that you can color the circles by different demographic dimensions.

I’ve experimented with this with some large data sets and Tableau does a great job with packing the bubbles intelligently.

What About Trend Lines?

Since we are using discrete measures on the rows and columns shelves we cannot produce trend lines.  When I first started this project I experimented with more traditional jittering and was able, with a fair amount of fuss and bother, to produce this.

05_TrendLineExample

A special thanks to Jeffrey Shaffer who provided a link on how to create pseudo-random numbers in Tableau (thank you, Josh Milligan).

I prefer the example that doesn’t require the jittering, but if you need to trend lines or if you prefer the jittered look I’ve included the example in the downloaded packaged workbook (see below).

It also occurred to me that the trend line would be based on the jittered values and not the actual values.  The same workbook contains a “home grown” trend line based on the actual values (courtesy of Joe Mako). It turns out the jittered trend line is almost identical to the non-jittered trend line so I suspect you won’t need to take the “home grown” approach.


Update

I received a number of comments here and on LinkedIn about the “drill down / break down” capability and that it is hard to see the percentage of dots by category.  For example, if you break down by generation do the dots for one generation cluster more in one part of the trellis than in others?

I thought that in this case having a different-colored bubble per category where the size of the bubble was proportionate to the percentage of responses within that category made sense.

Size by Category

I thought building this would be easy, but I needed to call in the heavy artillery (Joe Mako).

I’ll blog about the solution later. In the meantime the packaged workbook below contains this additional approach.

[suffusion-the-author]

[suffusion-the-author display='description']

  12 Responses to “Likert vs. Likert on a Scatterplot”

Comments (12)
  1. Great post. You also inadvertently inspired one of my next Brinton posts. He did a scatterplot just like these in his 1914 book. I’d never seen one implemented since. But now I’ve got one!

    • Andy, I caught a break in that the packed bubbles worked well and are much easier to explain. The jitter technique is more complicated and I just know someone would challenge me to arrange the, in a circle rather than a square. Looking forward to your post.

  2. Steve, great job as ever. On the embedded Tableau, it looks like your partitioning for the jitter may be off slightly – they are all in a straight line

  3. Alex, thanks. I’m getting the results I expect and am wondering what you are seeing. Can you send me a screen shot at swexler@datarevelations.com?

    BTW, your mod suggestion for the INDEX() function has had many great side benefits besides it’s original intent, the best being that I can make the strip narrower than the axis.

    Steve

  4. I find the various breakdowns (e.g., Gender, your figure 2 here) hard to read. My brain is not very good at comparing “area of disk edge” to “area of disk interior”. Maybe you could pack the bubbles horizontally instead of radially?

    Oh goodness, maybe the limit of that idea is a pie chart. Hmmmmmmmmmm…..

    In any case, I find it really hard to tell whether male or female has larger area / more bubbles.

    • Daniel,

      The real question is the percentage of men / women that have chosen each combination, not the number. You can use the legend to highlight, but I’m working on a better way to show this. My fault for providing the drill-down option.

      Steve

  5. Hi Steve:

    Good to meet you at TCC14 and I am winding off from that :)

    Great post on the likert-vs.likert relationship. I came across a situation (kind of similar) to this one at work with my survey analytics project and wanted to run it by you.

    Of the many questions; we want to compare two questions;
    Q#1 – First Call Resolution (FCR) – Allowed answers True or False
    — This question basically allows us to see if we were able to solve customers question with one call
    — True means YES we solved and False means NO we did not solve and customer called multiple times

    Q#2- Overall Quality (OQ) – Allowed answers 1 through 5 (1 low and 5 high)
    — Basically tells us how customer rated the overall quality of the call
    — we treat 4 and 5 as Top buckets and 1 to 3 as bottom buckets

    Data is one record per customer call per question with an answer. Now, what we have to show is;
    – what % of customer who answered True to FCR ended up with OQ buckets (top or bottom)
    – what % of customer who answered False to FCR ended up with OQ buckets (top or bottom)

    So, there are 4 combinations and we want to show this month over month using line graph.

    What I did:
    ————–
    As we have one record per customer per question with an answer, I self joined the data set to get these two responses on a single line, as it was easy with the comparisons. I tried with table calcs , but couldn’t figure it out.

    Is there a better way to do this than self joining the tables, as this will hit the performance. If you send in your email,

    Thanks a bunch in advance.

    ..kk

    ..kk

  6. Steve: Did you also notice that if you change the mark type from circle to square, the viz goes from a packed bubble to a tree map. It alters the viz pretty dramatically and I’m not sure for the better. Bit, it may get at the issue Dan Halperin is trying to solve.

    Bruce

Leave a Reply to Andy Cotgreave Cancel reply

(required)

(required)