swexler

Jan 172017
 

By Steve Wexler

January 17, 2017

This is a follow-up to the post Jeff Shaffer and I wrote about what can happen when people fail to question sources and inadvertently amplify baseless findings.

Overview

There’s been great feedback on things the community can do to maintain all the good things about Makeover Monday (MM) and at the same time reduce the occurrence of bad things.  Before I go any further I want to reiterate that I both like and value MM.  I’ve seen some incredibly good dashboards that inform my consulting practice and I plan to publish a blog on some of the design and analytic masterpieces I’ve seen.

But first…

Guilty as charged

Before presenting some recommendations, I want to cite an e-mail I received from Ben Jones pointing out some hypocrisy on my part. Ben writes:

“…did you know that the Axalta coating systems Global Automotive color popularity report that you used to make this viz doesn’t even mention the sample sizes, sampling plan, or methods of their surveys anywhere in the report?”

Ben is completely correct, and I immediately followed my own recommendations for the 100+ errant visualizations from week 1 of Makeover Monday 2017 and added this disclaimer to my dashboard:

Figure 1 -- The disclaimer / warning that now appears on my dashboard.

Figure 1 — The disclaimer / warning that now appears on my dashboard.

So, why did I get in such a huff about folks not vetting their sources for the Australian pay gap but didn’t bother to vet the data for my makeover example?

It comes down to three things:

  1. The number of people participating
  2. The magnitude of the mistake
  3. The importance of the data

For the Australian pay gap makeover there were

ONE — Over 100 people making

TWO — Big mistakes with

Three — “High-stakes” data

Recommendations

Avoid “high-stakes” data sets. Some folks will make the argument that iPhone sales could be considered “high stakes” as someone might make a stock buying or selling decision based on this. Fair enough, but for MM I suggest Andy and Eva avoid data that deals with gender, race, guns, shootings, and anything else that is politically charged.

Get ideas from reputable data sources. Realize that source data from the Australian government is, in fact, good data. The problem is that it isn’t good for making meaningful gender comparisons. So, where did the assertion of giant wage gaps come from? The MM folks cited an article from Women’s Agenda.  I spoke with data journalist Chad Skelton and he suggested that for high-stakes subjects MM should stick with ideas from the likes of The Guardian, The Wall Street Journal, and other well-known entities where reporters confer with statisticians and economists before publishing conclusions.

Note: Jeff Shaffer did some more digging and it appears that Women’s Agenda had in fact republished findings from Business Insider Australia. I have sent e-mails to editors at both publications.

Place a prominent Makeover Monday logo and disclaimer on every dashboard. Since Makeover Monday is an exercise in data visualization redesign, why not have participants trumpet MM’s purpose on every dashboard?  By having a small but prominent logo along with a disclaimer we can both raise awareness of the project and caution people that the purpose of the dashboard is to practice data visualization techniques. My only concern is that people will think this gives them a free pass to present a bogus headline and/or specious findings.

While I think the logo below is spoken for, maybe something like it would help viewers know that for the dashboard in question, it’s not about the data, it’s about the design.

Figure 2 -- Possible logo for Makeover Monday?

Figure 2 — Possible logo for Makeover Monday?

Vetting the data — the birth of “Find the Flaw Friday”?

In the previous post, I indicated that the one hour MM recommends people spend on a makeover is rarely enough time to properly vet the data, let along craft a visualization.  And asking Eva and Andy to vet all the data is completely impractical as I’m sure it’s hard enough to curate the project as it stands now.

Many of the people I corresponded with have said that exploring and really understanding the data is as important, if not more important, than designing a cool dashboard (this gets us back to the substance over style discussion.) A lot of people indicated they find exploring and vetting data more interesting than fashioning visualizations.

So, how can we help people practice analysis and not just design and presentation?

At least five people indicated they would be up for what I will call “Find the Flaw Friday” (FFF) where people are tasked with exploring a data set and determining what analysis would be sound and what would likely to be flawed. I’m not sure how easy this will be to manage and how much time people will need to spend on each project, but I will put the folks that expressed an interest in touch with each other and we’ll see what materializes.

If this works perhaps the data sets from FFF could feed MM?

Final thought — Please fix (or take down) your erroneous dashboards

I consider wage inequality a big deal, and if you are going publish anything that’s about a big deal, you need to get it right.

So, now that people know that their week 1 MM dashboards are wrong, what should they do?

FIX THEM!

In conferring with Chad, he wonders how people would deal with something like this in their jobs.  For example, suppose you create a beautiful visualization for the CEO and realize later that the central point of the viz is deeply flawed?  Even if the mistake weren’t your fault would you rush to correct the record to keep your reputation intact?

Everyone — and I would argue that this should come from the Andy and Eva as well as the folks that manage Tableau Public — should tweet about the errors to make sure people don’t continue to perpetuate the misinformation.

It’s one thing to get something wrong. It’s another thing to know something is wrong and not fix it.

So please, fix it.

Postscript

Tableau Zen Master Brit Cava recommends a fascinating Freakonomics podcast with economist and gender gap expert Claudia Goldin.  See http://freakonomics.com/podcast/the-true-story-of-the-gender-pay-gap-a-new-freakonomics-radio-podcast/.  VERY worthwhile.

 

 Posted by on January 17, 2017 1) General Discussions, Blog Tagged with: ,  No Responses »
Jan 152017
 

By Steve Wexler, January 15, 2017

Overview

Before going any further this post assumes you’ve gotten your data “just so”; that is, you’ve reshaped your data and have the responses in both text and numeric form.

If you’re not sure what this means, please review this post.

Taking inventory by finding the universe of all questions and all responses

Anyone who has attended one of my survey data classes or has watched my 2014 Tableau Conference presentation knows the first thing I like to do is assemble a demographics dashboard so I know who took the survey.

The second thing I do arose a few months ago when I had a “why didn’t I do this before” moment with respect to getting a good handle on questions, responses, and seeing if there was anything that was poorly coded.

Here’s how it works.

Note that I’m using the same sample data set that I use for my classes.

Figure 1 -- Screen shot after just having connected to the data.

Figure 1 — Screen shot after just having connected to the data.

  1. Drag Question Grouping onto rows, followed by Wording, and then Question ID, as shown below.

    02_PartialInventory

    Figure 2 — About half-way through taking inventory.

  2. Right-click the measure called Value and select Duplicate.
  3. Rename the newly-created field Value (discrete).
  4. Drag the measure into the Dimensions area. This will make Tableau treat the field as something that is by default discrete.

    Figure 3 -- Turning the measure into a discrete dimension

    Figure 3 — Turning the measure into a discrete dimension

  5. Drag new newly-created dimension to Rows.
  6. Drag Labels to Rows. Your screen should look like the one shown below.
Figure 4 -- All the questions and all the responses, all on one sheet.

Figure 4 — All the questions and all the responses, all on one sheet.

So, just what do we have here?

You can see from the portion of the screen that you have a bunch of questions about “Importance” and can also see that the possible values go from 0 to 4 where 0 maps to “Not at all important”, 1 maps to “Of Little Importance”, etc.

At this point you should be looking for any stray values, say a value of 5.

If you scroll down a little bit (Figure 5) you’ll see a question grouping called “Indicate the degree to which you agree” where you again have values of 0 through 4 but this time 0 maps to “Not at all”, 1 maps to “Small degree”, etc.

We should be pleased as it appears that our Likert questions consistently go from 0 through 4. This means we won’t have to craft multiple sets of calculated fields to deal with different numeric scales (not that having to do that would be a big deal).

05_Further Inventory

Figure 5 — Next set of questions.

At this point it might be useful to add a filter so you can focus on only certain question groups. You can do this by filtering by Question Grouping as shown below.

Figure 6 -- Adding a filter makes it easier to focus on specific question groupings

Figure 6 — Adding a filter makes it easier to focus on specific question groupings

Spotting questions that have coding errors

In case you’re wondering what a coding error looks like, see what happens if we just focus on the “What do you measure” questions, as shown below.

Figure 7 -- An example of a mis-coded check-all-that-apply question

Figure 7 — An example of a mis-coded check-all-that-apply question

So, for all of the check-all-that-apply questions the universe of possible values is 0 and 1. And with the exception of Question Q6 (Breathing), 0 maps to “No” and 1 maps to “Yes.”

The mis-coding of “Ni” instead of “No” will only present a problem if our calculated field for determining the percentage of people that checked an item were to use Labels instead of Values. My preferred formula for this type of calculation is this:

SUM([Value]) / SUM([Number of Records)])

Because we’re using [Value] instead of [Label], the miscoding for this example won’t cause a problem.

Conclusion

Creating a giant text table that maps all Question Groupings, Question IDs, Labels, and Values on a single sheet allows us to quickly take an inventory of all questions and possible responses.  This in turn allows us to see if questions are coded consistently (e.g., do all the Likert Scale questions use the same scale) and to see if there are any coding errors.

I just wish I had started doing this years ago.

 

 Posted by on January 15, 2017 2) Visualizing Survey Data, Blog Tagged with: ,  No Responses »
Jan 092017
 

By Steve Wexler and Jeffrey Shaffer

January 9, 2017

Please also see follow-up post.

Overview

Makeover Monday, started by Andy Kriebel in 2009 and turned into a weekly social data project by Kriebel and Andy Cotgreave in 2016, is now one of the biggest community endeavors in data visualization. By the end of 2016 there were over 3,000 submissions and 2017 began with record-breaking numbers, with over 100 makeovers in the first week. We are big fans of this project and it’s because of the project’s tremendous success and our love and respect for the two Andys (and now Eva Murray) that we feel compelled to write this post.

Unfortunately, 2017 started off with a truly grand fiasco as over 100 people published findings that cannot be substantiated. In just a few days the MM community has done a lot damage (and if it doesn’t act quickly it will do even more damage.)

What happened

Woah!  That’s quite an indictment. What happened, exactly?

Here’s the article that inspired the Makeover Monday assignment.

So, what’s the problem?

The claims in the article are wrong.  Really, really wrong.

And now, thanks to over 100 well-meaning people, instead of one website that got it really, really wrong there are over 100 tweets, blog posts, and web pages that got it really, really wrong.

It appears that Makeover Monday participants assumed the following about the data and the headline:

  • The data is cited by Makeover Monday so it must be good data.
  • The data comes from the Australian Government so it must be good data that is appropriate for the analysis in question.
  • The headline comes from what appears to be a reputable source, so it must be true.

Some Caveats

Before continuing we want to acknowledge that there is a wage gap in Australia; it just isn’t nearly as pronounced as this article and the makeovers suggest.

The data also looks highly reputable; it’s just not appropriate data for making a useful comparison on wages.

Also, we did not look at all 100+ makeovers. But of the 40 that we did review all of them parroted the findings of the source article.

Some makeover examples

Here are some examples from the 100+ people that created dashboards.

Figure 2 -- A beautiful viz that almost certainly makes bogus claims. Source: https://public.tableau.com/profile/publish/Australias50highestpayingjobsarepayingmensignificantlymore

Figure 2 — A beautiful viz that almost certainly makes bogus claims. Source: https://public.tableau.com/profile/publish/Australias50highestpayingjobsarepayingmensignificantlymore

example2

Figure 3– Another beautiful viz that almost certainly makes bogus claims.  Source: https://public.tableau.com/profile/publish/MM12017/Dashboard1#!/publish-confirm

example3

Figure 4 — A third beautiful viz that almost certainly makes bogus claims.  Source: https://public.tableau.com/profile/publish/AustraliaPayGap_0/Dashboard1#!/publish-confirm

example4

Figure 5 — Yet another beautiful viz that almost certainly makes bogus claims.  Source: https://public.tableau.com/views/GenderDisparityinAustralia/GenderInequality?:embed=y&:display_count=yes&:showVizHome=no#1

Goodness! These dashboards (and the dozens of others that we’ve reviewed) are highlighting a horrible injustice!

[we’re being sarcastic]

Let’s hold off before joining a protest march.

Why these makeovers are wrong

Step back and think for a minute. Over 100 people created a visualization on the gender wage gap and of the dashboards we reviewed, they all visualized, in some form, the difference between male Ophthalmologists earning $552,947 and females that only earned $217,242 (this is the largest gap in the data set.)

Did any of these people ask “Can this be right?”

This should be setting off alarm bells!

There are two BIG factors that make the data we have unusable.

One — The data is based on averages, and without knowing the distributions there’s no way to determine if the data provides an accurate representation.

Here’s a tongue-in-cheek graphic that underscores why averages may not be suited for our comparison.

problems-with-averages

Figure 6 — The danger of using averages.  From Why Not to Trust Statistics.

Here’s another real-world graphic from Ben Jones that compares the salaries of Seattle Seahawks football players.

benjones_salaries

Figure 7 — Seattle Seahawks salary distributions. Source: Ben Jones.

Ben points out

The “average” Seahawks salary this year is $2.8M. If you asked the players on the team whether it’s typical for one of them to make around $3M, they’d say “Hell No!”

Two — The data doesn’t consider part time vs. full time work. The data is from tax returns and doesn’t take into account the number of hours worked.

Let’s see how these two factors work with a “for instance” from the source data.

Figure 8 -- A snippet of the source data in question.

Figure 8 — A snippet of the source data in question.

So, there are 143 women Ophthalmologists making an average of $217K and 423 males making an average of $552K.

Are the women in fact being paid way less?  On average, yes, but suppose the following were the case:

Of the 143 women, 51 work only 25 hours per week

And of those 423 men, 14 of them are making crazy high wages (e.g., one of them is on retainer with the Sultan of Brunei).

Could the 51 part-time workers and the 14 insanely-paid workers exaggerate the gap?

Absolutely.

Is this scenario likely?

About the Sultan of Brunei?  Who knows, but about hours worked?

Very likely.

We did some digging and discovered that as of 2010, 17% of the male workforce in Australia was working part time while 46% of the female workforce was working part time.

This single factor could explain the gap in its entirety.

Note: Not knowing the number of hours worked is only one problem. The data also doesn’t address years of experience, tenure, location, or education, all of which may contribute to the gap.

Findings from other surveys

We did some more digging…

Data from the Workplace Gender Equality Agency (an Australian Government statutory agency) shows that in the Health Care field, 85% of the part-time workers in 2016 were female. This same report shows a 15% pay gap for full-time Health Care employees and only a 1% gap for part-time employees.

Finally, a comprehensive study titled Differences in practice and personal profiles between male and female ophthalmologists, was published in 2005. Key findings from this survey of 254 respondents show:

  • 41% of females worked 40 hours per week compared with 70% for males.
  • 57.5% of females worked part-time compared with 13.6% for males.
  • The average income for females was AUS$ 38,000 less than males, not $335,000 less.
    (Yes, that’s still a big gap, but it’s almost 10 times less than what the article claims).

Why this causes so much damage

It would keep me up at night to think that something I did would lead to somebody saying this:

“Wait!  You think the wage gap here is bad; you should see what it’s like in Australia.  Just the other day I was looking at this really cool infographic…”

So, here we are spreading misinformation. And it appears we did it over 100 times! The visualizations have now been favorited over 500 times, retweeted, and one was featured as the first Tableau Viz of the Day for 2017.

We’re supposed to be the good guys, people that cry foul when we see things like this:

Figure 9 -- Notorious Fox News Misleading Graphic.

Figure 9 — Notorious Fox News Misleading Graphic.

Publishing bogus findings undermines our credibility. It suggests we value style over substance, that we don’t know enough to relentlessly question our data sources, and that we don’t understand when averages work and when they don’t.

It may also make people question everything we publish from now on.

And it desensitizes us to the actual numbers.

Let us explain. There is clearly a gender wage gap in Australia. The Australian government reports the gender wage gap based on total compensation to be around 26% for all industries, 23% for full-time and 15% for full-time health care (base pay is a smaller gap). While we can’t calculate the exact difference for full-time or part-time ophthalmologists (because we only have survey data from 2005), it appears to be less than 15%.

Whatever the number is, it’s far less than the 150% wage gap shown on all the makeovers we reviewed.

And because we’ve reported crazy large amounts, when we see the actual amount — say 15% — instead of protesting a legitimate injustice, people will just shrug because 15% now seems so small.

How to fix this

This is not the first time in MM’s history that questionable data and the lack of proper interrogation has produced erroneous results (see here and here.) The difference is that this time we have more than 100 people publishing what is in fact really, really wrong.

So, how do we, the community, fix this?

  • If you published a dashboard, you should seriously consider publishing a retraction. Many of you have lots of followers, and that’s great. Now tell these followers about this so they don’t spread the misinformation. We suggest adding a prominent disclaimer on your visualization.
  • The good folks at MM recommend that participants should spend no more than one hour working on makeovers. While this is a practical recommendation, you must realize that good work, accurate work, work you can trust, can take much more than one hour. One hour is rarely enough time to vet the data, let alone craft an accurate analysis.
  • Don’t assume that just because Andy and Eva published the data (and shared a headline that too many people mimicked without thinking) that everything about the data and headline is fine and dandy. Specifically:
  • Never trust the data! You should question is ruthlessly:
    • What is the source?
    • Do you trust the source? The source probably isn’t trying to deceive you, but the data presented may not be right for the analysis you wish to conduct.
    • What does the data look like? Is it raw data or aggregations? Is it normalized?
    • If it’s survey data, or a data sample, is it representative of the population? Is the sample size large enough?
    • Does the data pass a reasonableness test?
    • Do not trust somebody else’s conclusions without analyzing their argument.

Remember, the responsibility of the data integrity does not rest solely with the creator or provider of the data. The person performing the analysis needs to take great care in whatever he / she presents.

Alberto Cairo may have expressed it best:

Unfortunately, it is very easy just to get the data and visualize it. I have fallen victim of that drive myself, many times. What is the solution? Avoid designing the graphic. Think about the data first. That’s it.

We realize that the primary purpose of the Makeover Monday project is for the community to learn and we acknowledge that this can be done without verified data. As an example, people are learning Tableau everyday using the Superstore data, data that serves no real-world purpose. However, the community must realize that the MM data sets are real-world data sets, not fake data. If you build stories using incorrect data and faulty assumptions then you contribute to the spread of misinformation

Don’t spread misinformation.

Jeffrey A. Shaffer
Follow on Twitter @HighVizAbility

Steve Wexler
Follow on Twitter @VizBizWiz

Additional reading

Why not trust statistics. Read this to see why the wrong statistic applied the wrong way makes you just plain wrong (thank you, Troy Magennis).

Simpson’s Paradox and UC Berkeley Gender Bias

The Truthful Art by Alberto Cairo.  If everyone would just read this we wouldn’t have to issue mass retractions (you are going to publish a retraction, aren’t you?)

Avoiding Data Pitfalls by Ben Jones. Not yet available, but this looks like a “must read” when it comes out.

Sources:

1. Trend in Hours worked from Australian Labour Market Statistics, Oct 2010.

http://www.abs.gov.au/ausstats/abs@.nsf/featurearticlesbytitle/67AB5016DD143FA6CA2578680014A9D9?OpenDocument

2. Workplace Gender Equality Agency Data Explorer

http://data.wgea.gov.au/industries/1

3. Differences in practice and personal profiles between male and female ophthalmologists, Danesh-Meyer HV1, Deva NC, Ku JY, Carroll SC, Tan YW, Gamble G, 2007.

https://www.ncbi.nlm.nih.gov/pubmed/17539782?dopt=Citation

4. Gender Equity Insights 2016: Inside Australia’s Gender Pay Gap, WGEA Gender Equity Series, 2016.

http://business.curtin.edu.au/wp-content/uploads/sites/5/2016/03/bcec-wgea-gender-pay-equity-insights-report.pdf

5. Will the real gender pay gap please stand up, Rebecca Cassells, 2016.

http://theconversation.com/will-the-real-gender-pay-gap-please-stand-up-64588

Nov 282016
 

Note: Major thanks to Nazirah Garrison and Christie Clark at Tableau for suggesting this approach.

Overview

With Tableau 10.x it is in fact possible to get your survey data, “just so” without having to invest in new tools and / or a engage in a time-consuming, error-prone procedure every time you need receive updated survey data.

There’s a lot of upside to this approach — everything is built into Tableau and you’ll just need to refresh the extracts (there will be several of them) as you get new data.  The downside is the setup is a little bit cumbersome and Tableau manifests some “hmm, that doesn’t seem right” behavior along the way.

Before embarking on this I encourage you read this post so you understand what I mean when I refer to the data being “just so.”

Also, if you would like to follow along you can download the source data here.

Three files and three extracts

Anyone that has read my posts or attended my classes know that I want survey responses in both text and numeric formats. Sure, you can in fact manage with one format or the other, but you’re just creating a LOT more work for yourself if you don’t have the data in both formats.

You will also need what I call a “Helper” file — this is just a separate file that maps each question ID into human readable form and groups related questions together. Again, you can certainly get by without it but you’ll be working much harder than you need to, especially if you ever compare “Importance” with “Satisfaction” Likert-scale questions.

For this example we will create a separate extract for each of these three files, then use Tableau 10.x’s ability to join three different data sources.

Let’s start by creating the extracts.

To pivot the data labels and create an extract

  1. Start Tableau 10.x and indicate you want to connect to Excel.
  2. Connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx and drag the sheet Data Labels into the Drag sheets here area as shown below.
    01_draglabels
  3. Leave the first five columns intact (the Resp ID, demographic stuff, and Weight) and select all the other columns.
  4. Click any of the selected columns and select Pivot from the context menu.
    02_firstpivot
  5. Rename the first pivoted column Question ID and the second column Label as shown below.
    03_rename
  6. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  7. When asked to save, name the file DR_JustSo_Labels.tde (make sure to note where you are saving the file).

One down, two to go.

To pivot the data numbers and create an extract

  1. Click the New Data Source icon and indicate you want to connect to an Excel file.
  2. As before, connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx, but this time drag Data Numbers to the Drag sheet here area.
  3. Hide the columns labeled Gender, Location, Generation, and Weight — we already have them in the other data source and don’t need them twice.
  4. Leaving Resp ID in place, select the second through the last columns.
  5. Click the down arrow on any of the selected columns and select Pivot from the context menu.
  6. Rename the first pivoted column Question ID and the second column Value, as shown below.
    04_secondpivot
  7. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  8. When asked to save, name the file DR_JustSo_Numbers.tde.

Two down, one to go.

To create the question helper extract

  1. Click the New Data Source icon and indicate you want to connect to an Excel file.
  2. As before, connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx, but this time drag Question Helper to the Drag sheet here area.
  3. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  4. When asked to save, name the file DR_QuestionHelper.tde.

All three data sources are now ready.

Joining the three data sources together

We now have our three data sources as separate Tableau extract files. We’ll combine these three files (and create an extract from the joined files) using Tableau 10.x’s ability to join files from different data sources.

Note: This is where we’ll encounter Tableau’s “head-scratching” behavior.

To join the three data sources

Note: In first trying this Tableau presented a lot of warning messages about not being able to materialize a temporary table. While I could ignore these warnings and muddle through, you may not be so lucky. It turns out the culprit was my anti-virus software. I temporarily disabled it and everything worked without a hitch.  See http://kb.tableau.com/articles/issue/error-unable-to-materialize-temporary-table-joining-data-sources.

  1. Click the New Data Source icon and indicate you want to connect to More, as shown below.
    05_more
  2. Select DR_JustSo_Labels.tde and click Open.
  3. Click Add, as shown below.
    06_add
  4. Click More and then select DR_JustSo_Numbers.tde. Do not be fooled, the correct fields have NOT yet been joined.
  5. Click the overlapping Venn diagram to display the Join dialog box, as shown below.
    07_join1
  6. Click Number of Records and then click the X that appears in the row to indicate you do NOT want to join these two data sources using this field.
  7. From the left data source indicate you want to join using Resp ID, as shown below.
    08_join2
  8. From the right data source indicate that you want to join using RespID (Extract1). I have no idea why the field is named this way. More on this in a moment.
  9. From the left data source indicate that you want to join using Question ID. Yes, you need to join on more than one field.
  10. From the right data source indicate you want to join using Pivot Field Names (Extract1). Okay… THIS is the thing that has me scratching my head and as of this writing (November 27, 2016) I have no idea why the field isn’t also called Question ID.

    January 4, 2017 — Now I know why this is happening. When using data extracts (.TDE files) Tableau is only able to keep track of the alias names for fields in the first .TDE file.  All of that info gets stripped out from the second .TDE file and Tableau just sees the original field name (Pivot Field Names). A little off-putting, yes, easy to address and it won’t matter a lick once we’re building our visualizations.

  11. Click Add again, select More, and select DR_QuestionHelper.tde.
  12. Click the second Venn overlapping circle and remove any of the joins that may be in there (most likely again using Number of Records).
  13. From the left data source select Question ID and from the right data source select QuestionID as shown below.
    09_join3
  14. Hide the fields RespID (Extract1), both Number of Records fields, the second Question ID, and Pivot Field Names, and rename Pivot Field Values to Value. Your screen should look like this.
    10_joinall
  15. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  16. When asked to save, name the file DR_JustSoAll.tde. After creating the extract Tableau will show all the field names grouped by data source, as shown below.
    11_datasource
  17. Click the down arrow and select Group by Folder.
    Okay, you don’t have to do this but I see no reason to group the fields by data source.

Congratulations, you now have your data “just so” and you did it all in Tableau.

So, how do you get the extracts to refresh?

Good question.  If you are using either Tableau Server or Online you can create shared data sources and program the extract to refresh on a regular or as-needed basis.

If you are using desktop — and you have all four data sources in in one workbook — you can just click the  Data menu and select Refresh All Extracts.

Conclusion

While I find the process outlined here both cumbersome and confusing (what is up with those renamed fields NOT staying renamed?) this approach does appear to work and you only need to set it up once. The same cannot be said of The Tableau add-in for Excel which requires a lot of manual intervention every time you want to update the data.

Will this replace Alteryx as my tool of choice? No, but it does work and you can’t beat the price.

 Posted by on November 28, 2016 2) Visualizing Survey Data, Blog Tagged with: , , , ,  45 Responses »
Oct 172016
 

Overview

So, you’ve created a wonderful collection of survey data dashboards that have innumerable demographic filters so that users can, for example, just see responses from left-handed Los Angeles Lakers fans between the ages of 34 and 39.

So, what’s the problem?

Actually, there are two problems. The first occurs when extreme filtering reduces the number of responses so much that the results are statistically meaningless. The second is that you may inadvertently allow people to “glean” who has answered a survey.  For example, if you conduct a salary survey you want participants to be assured that nobody will be able to see individual responses. But if you have too many filters it may be possible to winnow down the results so you can guess who provided the answer.

Fortunately, it’s easy to set up a graceful way to suppress a chart and display an error message in its place when the “n” count gets too low.

How it works

Consider the dashboard shown in Figure 1.  Notice that the upper right corner shows that with nothing filtered there are a total of 350 responses.

01_checkall

Figure 1 — A simple dashboard showing results for a check-all-that-apply question.

Also notice there’s a parameter control that allows you to specify the cut-off point for displaying the visualization.

Now let’s see what happens if we use the filters to winnow down the number of responses to the point that there are fewer than 20 (Figure 2.)

Figure 2 -- With too few responses the bar chart is suppressed and the warning message is displayed.

Figure 2 — With too few responses the bar chart is suppressed and the warning message is displayed.

What’s happening here is that there are two floating charts both with similar filters that looks at how many survey responses there are. The bar chart in Figure 1 is set to appear if the number of responses is greater than or equal to 20. The warning message (it’s just a Tableau worksheet) appears if the number of responses is fewer than 20.

How the filters work

Let’s look first at what drives the bar chart (Figure 3.)

Figure 3 -- Pill settings and filters for the bar chart.

Figure 3 — Pill settings and filters for the bar chart.

Notice in particular there is a field called [Minimum Count] that is on Filters card and that it is set to True. The field [Minimum Count] is defined as follows:

Figure 4 -- How [Minimum Count] is defined.

Figure 4 — How [Minimum Count] is defined.

Here [Count Threshold] is the fill-in-the-blank parameter (currently set to 20).

So, the visualization will only appear if there are at least 20 responses; otherwise the filter “kills” the viz and the only thing we see is the title.

Now, how does the secondary visualization work?  Let’s have a look (Figure 5.)

Figure 5 -- Pill and filter settings for the "warning" visualization.

Figure 5 — Pill and filter settings for the “warning” visualization.

Notice that [Minimum Count] is also on the Filter card but is set to False. We’re seeing the viz (the red message) because the filters in place result in fewer than 20 responses.

And just what is producing the message? It’s the field [Too Few] that’s been placed on the Text button on the Marks card. The field is defined as follows.

Figure 6 -- Definition of the field [Too Few].

Figure 6 — Definition of the field [Too Few].

Why use floating elements?

We certainly could cram the two visualizations into a container and make sure that when one is displayed the other only takes up a few pixels.  I elected to go with the floating approach but made sure that the secondary viz was set to be in back of the primary viz.

Is that all there is to it?

For a check-all-that-apply question, that’s all you need to know, but some question and visualization types may need different approaches.

Consider Figure 7 where we see a jitterplot comparing salary data for men and women where each dot represents a response from an individual survey participant.

Figure 7 -- Pill and filter configuration for a jitterplot visualizing salary data.

Figure 7 — Pill and filter configuration for a jitterplot visualizing salary data.

Notice that we have a different field (one that uses a Table calculation) on the Filters card.

There’s also a very different setup to display the warning message for this visualization as we cannot simply base this on SUM([Number of Records]).

I will leave it to the reader to explore how this these are set up.  Just let me know when you come up with a better approach.

Conclusion

If you’re visualizing survey data and giving users filters you should come up with a game plan for what to do if there are too few responses. In this blog post (and the embedded, downloadable workbook) I present two approaches for two types of survey questions. Other questions types (for example, Likert-scale questions) will need some modifications to what I’ve presented here.

Sep 072016
 

Overview

TruthfulArtImagine a terrific introductory college course presented by a terrific professor.

That’s the feeling I had in reading The Truthful Art, Alberto Cairo’s follow up to his first book The Functional Art.

Whereas his first book took a “look at what you can and should do” approach to help people see and understand data, The Truthful Art is more of a “here’s what you need to know” if you want to be a data journalist — and there’s a lot of things you need to know if you are going to do a proper job.

I’m reluctant to use the term “data journalism” as Cairo’s book is for anyone that that is tasked with helping people make sense of data. The difference is that the data journalist’s work is likely to be public and yours may only be seen by people working in your organization. But while you may not have to make a dashboard that is as polished as an infographic from the New York Times, both you and the data journalist need to adhere to a particular doctrine and have sufficient skills across a wide range of topics if you are going to build functional, truthful, and meaningful visualizations.

First, Be Truthful

If the credo for doctors is to “first, do no harm” Cairo might argue that the credo for data journalists is to “first, be truthful.” Cairo makes the case that a good visualization must be

  • Truthful
  • Functional
  • Beautiful
  • Insightful
  • Enlightening

And it must be these things in this order of priority. That is, the visualization must first be “relevant, factual, and accurate” and only then should it be “accessible and engaging.” Cairo further states that “honesty, clarity, and depth come first.” Indeed, this is why he bristles with outrage over deceitful graphics like this one.

So, how, exactly, does one create something that is truthful, functional, beautiful, insightful, and enlightening?

By achieving a sufficient level of competence in a LOT of different areas.

And just what are those areas?

The Data Journalism Landscape

In reading The Truthful Art you may feel like you are in a helicopter several thousand feet above the data visualization landscape. In each section Cairo, as expert guide, will gently descend to several hundred feet above a particular area and allow you to examine varied topics including design, statistics, color, storytelling, psychology, and ethics. While the book never gets deep into any of these subjects Cairo does provide excellent resources for anyone interested in exploring a particular topic in depth as every chapter of the book ends with a section titled “To Learn More.”

While Cairo’s writing is disarmingly warm and engaging he takes the responsibility of data storytelling very seriously. By the end of the book you will have an excellent understanding of the investment needed to make a worthwhile contribution to your company, society, or both.

Conclusion

Whether you are new to the field or have been practicing for years, I’m confident you’ll find The Truthful Art, like its predecessor, to be fun, elucidating, and inspiring.

The Truthful Art

Paperback: 400 pages

Publisher: New Riders; 1 edition (February 28, 2016)

Aug 112016
 

Overview

As readers of this blog know, I have my problems with donut charts.

That said, I acknowledge that they can be cool and, under certain circumstances, enormously useful.

On a recent flight I was struck by how much I liked the animated “estimated time to arrival” donut chart that appeared on my personal TV screen. An example of such a chart is shown in Figure 1.

Figure 1 -- Donut chart showing progress towards completion of a flight.

Figure 1 — Donut chart showing progress towards completion of a flight.

I find this image very attractive and very easy to understand — I can see that I’m almost three-quarters of the way to my destination and that there are only 49 minutes left to the flight.

So, given how clear and cool this is, why not use them on a dashboard?  And if one is good, why not use lots of them?

It’s the “more than one” situation that may lead to problems.

Trying to make comparisons with donut charts

The flight status chart works because it shows only one thing only: a single item’s progress towards a goal.

Let’s see what happens when we want to compare more than one item.

Consider the chart in Figure 2 that shows the placement rates for Fremontia Academy.

Figure 2 -- Donut chart showing placement percentage.

Figure 2 — Donut chart showing placement percentage.

A 95% placement percentage is really impressive.  Is that better than other institutions?  If so, how much better is it?

Figure 3 shows a comparison among three different institutions using three different donut charts.

03_3Donuts

Figure 3  — Three donut charts displaying placement percentages for three different institutions.

Before digging deeper let’s replace the three separate donuts with a donut-within-a-donut-within-a donut chart (Figure 4.)

Figure 4  -- A concentric donut chart (also called a “radial bar chart” or a “pie gauge.”)

Figure 4  — A concentric donut chart (also called a “radial bar chart” or a “pie gauge.”)

“What’s the problem?” you may ask, “these comparisons are easy.”

While you may be able to make the comparisons you are in fact working consierably harder than you need to be.

Really.  Let me prove it to you.

Let’s suppose you wanted to compare the heights of three famous buildings: One World Trade Center, The Empire State Building, and The Chrysler Building (Figure 5).

Figure 5  -- Comparing the size (in feet) of three large buildings.

Figure 5  — Comparing the size (in feet) of three large buildings.

Now that’s an easy comparison. With virtually no effort we can see that One World Trade Center (blue) is almost twice as tall as The Chrysler Building (red).

Now let’s see how easy the comparison is with donuts (Figure 6.)

Figure 6  -- Three large buildings twisted into semi-circles.

Figure 6  — Three large buildings twisted into semi-circles.

Here are the same buildings rendered using a concentric donut chart (Figure 7).

Figure 7  -- Three skyscrapers spooning.

Figure 7  — Three skyscrapers spooning.

Yikes.

So, with this somewhat contrived but hopefully memorable  example we took something that was simple to compare (the silhouettes of buildings) and contorted them into difficult-to-compare semi-circles.

With this in mind, let’s revisit the Placement example we saw in Figure 3.

Here is the same data rendered using a bar chart.

Figure 8 -- Placement percentage comparison using a bar chart.

Figure 8 — Placement percentage comparison using a bar chart.

The comparison is much easier with the bars than with the donuts / semi-circles. You can tell with practically no effort that the blue bar is approximately twice as long as the red bar, even without looking at the numbers.

Indeed, that’s a really good test of how clear your visualization is: can you compare magnitude if the numbers are hidden?

Pop quiz — how much larger is the orange segment compared to the red segment?

Figure 9 -- Trying to compare the length of donut segments is difficult.

Figure 9 — Trying to compare the length of donut segments is difficult.

Now try to answer the same question with a “boring” bar chart.

Figure 10 -- Comparing the length of bars is easy.

Figure 10 — Comparing the length of bars is easy.

With the circle segments you are squinting and guessing while with the bars you know immediately: the orange bar is twice as large as the red bar.

More downsides for donuts

In addition to comparisons being difficult, how would you handle a situation where you exceeded a goal?  For example, how do you show a salesperson beating his / her quota?  With a bar chart you can show the bar going beyond the goal line (Figure 11).

Figure 11 -- With a bar chart it's easy to show more than 100% of goal.

Figure 11 — With a bar chart it’s easy to show more than 100% of goal.

How do you show this with a donut chart?

Rhetorical question.  You can’t.

Conclusion

If you only have to show progress towards a single goal and don’t need to make a comparison then it’s fine to use a donut chart. If you need anything more complex you should use a bar chart as it will be much easier for you and your users to understand the data.

Special thanks to Eric Kim for creating the building images.

 

Jun 222016
 

Overview

My obsession with finding the best way to visualize data will often infiltrate my dreams. In my slumbers I find myself dragging Tableau pills in an ongoing pursuit to come up with the ideal dashboard that shines light on whatever data set has invaded my psyche.

But is the pursuit of the perfect dashboard folly?

Probably, as I’ll explain in a minute, but I don’t want to suggest anyone not at least try for the clearest, most insightful and most enlightening way to display information.

Is this way is the best way?

This pursuit of the ideal chart preoccupies a lot of people in the data visualization community. Consider this open discussion between Stephen Few and Cole Nussbaumer Knafflic that transpired earlier this year.

As you will read, Few weighs in on Knaflic’s book Storytelling with Data and her use of 100% stacked bar charts.  He cited this particular example.

Figure 1 -- Knafflic's 100% stacked bar

Figure 1 — Knafflic’s 100% stacked bar

Few argued that there was a better approach and that would be to have a line chart with a separate line for each goal state.

Figure 2 -- Few's line chart

Figure 2 — Few’s line chart

Having written about visualizing sentiment and proclivities, I chimed in suggesting that a divergent stacked bar chart would be better (see Figure 3.) I think this presents a clearer and more flexible approach, especially if you have more than three categories to compare as the 100% stacked bar chart and line chart can become difficult to read.

Figure 3 -- My divergent stacked bar chart

Figure 3 — My divergent stacked bar chart

The ongoing public discussion was engaging and congenial but I’ve seen similar cases where one or more of the parties advocating a solution become so certain that his / her approach is without a shadow of a doubt the only right way to present the data that tempers flare high. Indeed, I’ve seen instances where some well-respected authors have declared a type of “Sharia Law” of data visualization and have banned so-called heretics and dilettantes from leaving comments on blogs and even following on Twitter!

My take? While I prefer the divergent stacked bar, the real question is whether the intended audience can see and understand the data. In this case, if management cannot tell from any of the three charts that there was a problem that started in Q3 2014 and continued for each quarter, then that company has some serious issues.

In other words, if the people that need to “get” it can in fact make comparisons, see what is important, and make good decisions on their new-found understanding of the data  — all without having to work unnecessarily hard to decode the chart — then you have succeeded.

I’m not saying don’t strive to be as efficient , clear, and engaging as possible, it’s just that the goal shouldn’t be to make the perfect chart; it should be to inform and enlighten.

And in this case I think all three approaches will more than suffice.  So stop arguing.

Understanding and educating your audience

Earlier this year I got a big kick out of something that Alberto Cairo retweeted:

Figure 4 -- Avoid Xenographphobia: The fear of unusual graphics / foreign chart types.

Figure 4 — Avoid Xenographphobia: The fear of unusual graphics / foreign chart types.

Xenographphobia! What a wonderful neologism meaning “fear of unusual graphics.”

So, why do I bring this up? While it’s critical to know your audience and not overwhelm them with unnecessary complexity, you should not be afraid to educate them as well. I’ve heard far too often people proclaim “oh, our executive team will never understand that chart.”

Really? Is the chart so complex or the executive so close-minded that they won’t invest a little bit of time getting up to speed with an approach that may be new, but very worthwhile?

I remember the first time I saw a bullet chart (a Stephen Few creation) and thought “what is this nonsense?”  It turns out it wasn’t, and isn’t, nonsense.  It took all of 60 seconds for somebody to explain how the chart worked and I immediately saw how valuable it was.

Figure 5 -- A bullet chart, explained.

Figure 5 — A bullet chart, explained.

I had a similar reaction when I first heard about jump plots from Tom VanBuskirk and Chris DeMartini. My thoughts at the time were “oooh… curvy lines.  I love curvy lines! But I suspect this is a case where the chart is too much decoration and not enough information. I bet there are better, simpler ways to present the data.”

Figure 6 -- Jump plot example. Yes, these are very decorative, but they are also wickedly informative.

Figure 6 — Jump plot example. Yes, these are very decorative, but they are also wickedly informative.

Then I spent some time looking into the use cases and came to the conclusion that for those particular situations jump plots and jump lines worked really well.

That said, there are some novel charts that I don’t think I will ever endorse, with the pie gauge being at the top of my list.

Figure 7 -- The pie gauge, aka, a donut chart within a donut chart, aka, stacked donut chart. I won't go into the use case here but a bullet chart is a much better choice.

Figure 7 — The pie gauge, aka, a donut chart within a donut chart, aka, stacked donut chart. I won’t go into the use case here but a bullet chart is a much better choice.

So, what should we do?

I’ve argued that you should always try to make it as easy as possible for people to understand the data but you should not go crazy trying to make the “perfect dashboard.”

I also argue that that while you should understand the skillset and mindset of your audience, you should not be afraid to educate them on new chart types, especially if it’s a “learn once, use over and over” type of situation.

But what about aesthetics, engagement, and interactivity? What roles do these play?  Is there a set of guidelines or framework we should follow in crafting visualizations?

Alberto Cairo, in his book The Truthful Art, suggests such a framework based on five key qualities.

I plan to write about these qualities (and the book) soon.

May 102016
 

Note: Special thanks to Joe Mako who came up with a considerably more elegant and extensible solution than I had first built. I describe Joe’s approach in this post.

Overview

I’m fortunate that when I have to work with survey data that is not well-coded I can take out my trusty copy of Alteryx Designer and get my data “just so.”

But what if you don’t have a copy of Alteryx? Or what if you want to use Qualtrics’ new Web Data Connector and don’t have the ability to insert a data prep tool within your workflow?

In this blog post we’ll see how you can get Tableau to address the problem.  Specifically, we’ll see how to work with check-all-that-apply questions where the data has been coded as 1s and blanks instead of 1s and 0s.

Note: While the example shown focuses on Qualtrics data the approach will work with any tool that leaves blanks for respondents that did not select an item (e.g., Survey Gizmo).

So, What’s Wrong with Blanks?

Consider the data snippet below that shows responses to a group of check-all-that-apply questions. This is what I consider to be badly-coded data.

Figure 1 -- Check all that apply questions as they might appear in a CSV file downloaded from Qualtrics

Figure 1 — Check all that apply questions as they might appear in a CSV file downloaded from Qualtrics

Now here’s how I would prefer the data to be coded.

Figure 2 -- Correctly-coded check all that apply questions

Figure 2 — Correctly-coded check all that apply questions

So, why is this better?

By having 1s and 0s we can see, easily, if some respondents didn’t answer the question group at all. In the figure below we can see that respondents 1002, 1006, 1011, and 1016 did not make any selection for any of the check-all-that-apply questions.  More importantly, with data coded with 1s and 0s, Tableau we can just filter out the Nulls and we’ll get accurate results.

 Figure 3 -- Respondents that didn't answer this question group are highlighted in yellow


Figure 3 — Respondents that didn’t answer this question group are highlighted in yellow

IMPORTANT: Before going any further, make note that respondents 1002, 1006, 1011, and 1016 did not make any selections and that of the 13 people that actually responded to Q1_4,  12 selected that option.  There will be a test.

So, given our constraints, how are we going to come up with a way to distinguish between a blank that means a respondent didn’t select a particular option and a blank that, when combined with three other blanks, means the person didn’t respond to the group of questions at all?

Before we plow into this I want to make sure people realize that we will soon pivot / reshape the data so that it is tall. Specifically, instead of dealing with this…

Figure 4 -- Data before reshaping (as you would get it if you downloaded a .CSV file.)

Figure 4 — Data before reshaping (as you would get it if you downloaded a .CSV file.)

… we’re going to be working with this:

Figure 5 -- Reshaped data using Tableau's built-in Pivoting feature. If you're using the Qualtrics web data connector you don't need to use this.

Figure 5 — Reshaped data using Tableau’s built-in Pivoting feature. If you’re using the Qualtrics web data connector you don’t need to use this.

Once we pivot the data we’ll see the following collection of dimensions and measures in Tableau.

Figure 6 -- Reshaped data as it appears in Tableau

Figure 6 — Reshaped data as it appears in Tableau

Note: For this example I’m not going to the trouble of creating a data set that has the survey responses in both numeric and text format. The technique we explore will work equally well for when you have both as when you just have numeric values.

Grouping the related questions

To be able to come up with an elegant, extensible way to tell Tableau “don’t take into account questions where people didn’t make any selections” we’re going to need a way to group related questions together.

Note that the technique I’m about to propose is different from the ad-hoc grouping approach I describe in Using Qualtrics’ Web Data Connector and Reshaping Survey Data with Tableau 9.0.  The programmatic “group” we create here needs to be referenced in a calculated field, and you can’t do that with an ad-hoc group.

Here we’re going to take advantage of Qualtrics’ Question ID naming convention and how related questions all look like this:

Q(N)_

That is, a group of related check-all-that-apply questions might be named like this:

Q1_1
Q1_2

Q1_8

And a group of related Likert scale questions might be named like this:

Q7_1
Q7_2

Q7_13

Given this standardization, we need a mechanism that will group everything to the left of the underscore. There are several ways we can achieve this, but for this example we’ll create a new field called “Question Prefix” and define it as follows:

Figure 7 -- Calculated field to that will group related questions based on whatever comes before the underscore in the Question ID field.

Figure 7 — Calculated field to that will group related questions based on whatever comes before the underscore in the Question ID field.

Now let’s use this field and the Resp ID field to see who did not respond at all to a particular group of questions.

Figure 8 -- This collection of fields show us who did not respond at all to a particular group of questions.

Figure 8 — This collection of fields show us who did not respond at all to a particular group of questions.

So, have a look at results in the image above where we place MAX(Value) on the text button. What’s happening is that for each respondent we’re looking at the answers to Q1_1, Q1_2, Q1_3, and Q1_4 and taking the maximum value across all those questions.  Notice that there are four blanks / Nulls for 1002, 1006, 1011, and 1016 — these are the four respondents that didn’t answer the question group at all (see, I told you there would be a test.)

We do NOT want to include these people when we calculate the percentage of people that selected a particular choice, so we can place MAX(Value) on the Filters shelf and indicate that we should not include Null values, as shown below.

Figure 9 -- Filter that will exclude Null values.

Figure 9 — Filter that will exclude Null values.

Applying this filter will show us responses from people that did NOT skip the question, as shown below.

Figure 10 -- Respondents that selected at least one option.

Figure 10 — Respondents that selected at least one option.

Okay, this is great, but the visualization we want to create shouldn’t look anything like this. Indeed, we have too many things on the level of detail at this point.

So what are we going to do?

Level-of-Detail Calculations to the Rescue

Anytime you think “I need Tableau to consider some dimensions when making a calculation, but those dimensions can’t be in the visualization” a giant alarm bell should go off in your head — this is precisely the use case for a LoD calculation!

Here’s the LoD calc, that when placed on the Filters shelf, will do the trick:

Figure 11 -- LoD calc that allows us to exclude folks that skipped answering a question group.

Figure 11 — LoD calc that allows us to exclude folks that skipped answering a question group.

A translation into English would be:

Even though [Question Prefix] and [Resp ID] may not be on the viz detail, take that combination of these two fields and find the MAX([Value]). Then find only the Null values from this and exclude them.

Note that we are using FIXED instead of INCLUDE as the former will automatically create a dimension and will work no matter what other filter settings may be in place.

Putting This Altogether

First we need to create a calculation that gives us the percentage of people that selected an option from the group of check-all-that-apply questions. For this instance the following calculation will work well.

Figure 12 -- How to determine the percentage of people that selected an option.

Figure 12 — How to determine the percentage of people that selected an option.

A quick translation into English would be:

Add up everybody that answered “1” (meaning they selected an option) and divide by the number of people that answered the question.

Let’s see if this works. Consider the visualization shown below.

Figure 13 -- First attempt at showing the percentage of respondents that selected an option. This is NOT showing the correct percentage.

Figure 13 — First attempt at showing the percentage of respondents that selected an option. This is NOT showing the correct percentage.

Does this pass the “smell” test? Look at Q1_4. You may recall that we saw that 12 out of 13 people selected this option — but 12 out of 13 is 92% not, 71%.  Indeed, if we place SUM([Number of Records]) as a discrete measure on the rows shelf we see that Tableau believes 17 people responded to the question.

Figure 14 -- Without a filter Tableau believes 17 people responded to this group of questions

Figure 14 — Without a filter Tableau believes 17 people responded to this group of questions

We need to put our LoD calc on the Filters shelf and set it to True.

Figure 15 -- Removing "All of these are Null" from the respondent pool gives us the correct count of 13 and the correct percentage (12 out of 13).

Figure 15 — Removing “All of these are Null” from the respondent pool gives us the correct count of 13 and the correct percentage (12 out of 13).

Conclusion

How check-all-that-apply questions are coded is a problem with many survey tools including Qualtrics and Survey Gizmo. If you have a good data preparation tools like Alteryx you can fix the problem, but if you don’t have such a tool or don’t want to insert something into your workflow then you can get Tableau to address the problem with a Level-of-Detail calculation.

Apr 112016
 

Overview

This past week I enjoyed looking at and interacting with Matt Chambers’ car color popularity bump chart.

 Figure 1 -- Matt Chambers' car color popularity bump chart.

Figure 1 — Matt Chambers’ car color popularity bump chart.  You can find the original Datagraver visualization upon which this was based here.

The key to this dashboard is interactivity as it’s hard to parse all the car colors at once. If you hover over one at a time it’s easy follow the trends, as shown here.

Figure 2 -- Hovering over a color shows you that color’s ranking over time

Figure 2 — Hovering over a color shows you that color’s ranking over time

Showing Rank Only

Over the past few months I’ve seen a lot of people making bump charts (myself included). As much as I like them I fear that people are leaving some critical insights out of the discussion as bump charts only show ordinal information and not cardinal information. That is, they show rank but not magnitude.

Consider the bump chart above.  In 2009 White was the number one color, Black was number two, and Red was a distant sixth.

Figure 3 -- Red appears to be a distant sixth

Figure 3 — Red appears to be a distant sixth

But was Red in fact “distant” or its popularity closer than it would appear?  When you just show rank there’s no easy way to tell.

Showing Rank and Magnitude

Consider the dashboard below that shows the overall ranking and percentage popularity for car colors over the last ten years.

Figure 4 -- Ranked Bar Chart dashboard with no colors selected

Figure 4 — Ranked Bar Chart dashboard with no colors selected

Right now we can see that over the last ten years white came in first place with 22% and Red came in fifth place with 11%.  Now let’s see what happens if we select red and white, as shown below.

Figure 5 -- Comparing popularity of white and red car over the last ten years.

Figure 5 — Comparing popularity of white and red car over the last ten years.

Here we can see everything that the bump chart had plus so much more. Specifically, we can see that White was in first place for the past ten years and that Red was as high as fourth place in 2007 and as low as sixth place in 2008 and 2009. But we can also see that in 2009 White was only 50% larger than Red while in 2015 it was almost 150% larger!

Try it yourself

Click here to interact with the color popularity ranked bar chart.

Ranked Bars are Versatile

The ranked bar approach works well showing rank and magnitude over time and across different categories.

Consider the dashboard below that shows the sales for the top 20 products overall and then a ranked breakdown by one of three possible categories (Customer Segment, Region, and Year)

Figure 6 -- Overall sales / rank and sales / rank broken down by Customer Segment.

Figure 6 — Overall sales / rank and sales / rank broken down by Customer Segment.

Here we can see not only how the Bosch Full Integrated Dishwasher is ranked overall and within the four Customer Segments, but we can also see how much more and less the other products’ sales were.

Here’s the same dashboard showing a breakdown by Region.

Figure 7 -- Overall sales / rank and sales / rank broken down by Region.

Figure 7 — Overall sales / rank and sales / rank broken down by Region.

The Bosch Dishwasher is fifth overall but it isn’t even in the Top 20 in the South.  We can also see that it is Second in the East, ever-so-slightly behind the first ranked product (the Whirlpool Upright Freezer.  You can see for yourself when you interact with the dashboard that’s at the end of the post).

Here’s the same data but presented using a bump chart.

Figure 8 -- Overall sales / rank and just rank by Region.

Figure 8 — Overall sales / rank and just rank by Region.

The bump chart looks cool but we only get part of the story as I can only glean rank.

Conclusion

The bump chart is a great choice if you want to show “soft” rankings, such as what place a team came in over time, but if you want to show rank and magnitude, consider the ranked bar chart instead.

Note: for step-by-step instructions on how to build a dashboard like the one below, see Visual Ranking within a Category.

The Ranked Bar Dashboard — Kick The Tires