Mar 142019
 

March 14, 2019

Overview

Since 2011 a highlight of Tableau’s annual conference is “Iron Viz,” a data visualization competition loosely based on “Iron Chef.” The premise of the Tableau contest is that three contestants who have won feeder competitions have 20 minutes to build a fully functional, interactive dashboard from scratch, with each contestant using the same data set. The winner is determined by tallying results from a panel of four judges (who account for 90 of the points awarded) and the audience (who account for 10 of the points). The victor is crowned Iron Viz Champion and wins $2,000. (The winner used to get an iPad, too, but now it’s just cash and bragging rights.)

Figure 1 — The audience watches as the 2018 Iron Viz champion is crowned.

I won the first Iron Viz competition ins Las Vegas. Back then, and for the first four years, the competition was a “breakout” session and competed for attention with other sessions. My guess is that one-quarter of the entire conference attended the first Iron Viz competition, meaning there were at most 350 people there. Contrast that with the 2018 competition where there were over 10,000 people watching.

Iron Viz is now huge, with production values that rival any major network TV competitions (think “America’s Got Talent” or “American Idol.”) If you’ve not seen the competition or don’t have a sense of how it works and how big it is, have a look at this video from last year.

So, given its success and popularity, why do I think anything is wrong with it, let alone suggest there’s a way to fix it?

I will try to articulate where the competition has lost some of its helpfulness to the community and how I think it can be fixed (and I will do my best to avoid too many “back in my day” pinings.)

So, what has changed for the worse?

Here are the five areas where I think there are problems and how I think they can be improved.

  • Data sets are no longer identical
  • Competitors can build too many things ahead of time
  • All the competitors’ entries should be downloadable from Tableau Public
  • Audience voting needs to be transparent and fair
  • Establish a code of conduct to eliminate what I think is cheating
Data sets are no longer identical

In 2018 Tableau added the requirement that attendees use Tableau Prep Builder to prepare and blend different data sets – different data sets of their own choosing. Yes, all the competitors received the same proprietary IBM weather data, but each could take his own data and add it to the mix and even ignore the main data set.

I think this “visualize whatever you want” hurts the competition. There was something amazing about “wow… look what *that* person found in the data” that made the competition special.

How might we fix this? Maybe a Tableau officiant can use Tableau Prep Builder to blend multiple data sets ahead of time and give this same, blended data set to each competitor? Perhaps have the show hosts demonstrate Tableau Prep Builder during the show preamble? I think the audience would appreciate seeing how the data was prepared and the audience will get a good understanding of just what is in the blended data set. You don’t get this if you allow competitors to bring in any data set

Competitors can build too many things ahead of time

By requiring Tableau Prep Builder you also encourage people to build calculations ahead of time. I was surprised to see how much stuff – calculations, images, digital assets – were either already in the database (thanks to Tableau Pep Builder) or sitting in a Word document, ripe for copying and pasting.  I’d like to see Iron Viz return to its roots of “you, the data, Tableau. Start from scratch. 20 minutes.”

All the competitors’ entries should be downloadable from Tableau Public

All the Iron Viz dashboards are amazing and making them readily downloadable is a great way to educate the Tableau community. I know I’ve learned a ton by downloading Iron Viz dashboards and opening them in Tableau to see how they were built. For example, check out Curtis Harris’ winning dashboard from the 2016 competition, or Tristan Guillen’s winning dashboard from the 2017 competition.

Figure 2 — I can download this dashboard from Tristan Guillen.  That’s a good thing.

What about the dashboards form the 2018 competition? They were built using proprietary weather data from IBM and cannot be downloaded.  What a shame!

Figure 3 — I can’t download this dashboard from Timothy Vermeiren. That’s a bad thing.
Audience voting needs to be transparent and fair

If I understand correctly, the winner is decided based on the findings from four judges and audience voting.  The judges account for 90 possible points and the audience accounts for the remaining 10 points. Each contestant can earn a maximum of 100 points.

And just how does the audience get counted? Through twitter hash tags. What is unclear is whether the audience vote is a winner-take-all thing, or if the 10 points are awarded proportionately. The 2018 winner, Timothy Vermeiren, wowed the audience by making a motion chart using Tableau’s pages shelf. 

I won’t get into a discussion as to whether the animation was helpful or gratuitous, but I believe it did earn Timothy more tweets. My question is whether that earned him 10 points or some fraction of that. It would be wrong for the tweet winner to get all ten points.

Note: Several people with insider knowledge have told me the twitter votes decided the winner on more than one occasion.

Establish a code of conduct to discourage cheating

Each contestant gets to work with a “sous vizzer.” This is Tableau employee that is available to coach, cajole, calm and assist the competitor during what is a very stressful few days. When I was competing my “sous vizzer” was Dustin Smith and I found it very helpful to be able to bounce ideas (and tater tots) off of him.

For me, it was implied but not explicitly stated that figuring out what to build was between me and my sous vizzer. It didn’t occur to me to look for additional help as that just seemed wrong.

Apparently not everyone shares my view on this. There was at least one year where the winning contestant received more than a little input from several of his work colleagues on analyzing the data and coming up with the best way to present the results. If this is true, I think it completely violates the spirit of the competition. I encourage Tableau to establish a code of conduct where each contestant pledges to only get help and feedback form his / her sous vizzer.

Conclusion

I am grateful for Tableau and Iron Viz as winning the competition helped establish my data visualization practice. I also enjoy attending the competitions as they are enormously entertaining – just look at host Andy Cotgreave’s tuxedo!

Figure 4 — A resplendent Andy Cotgreave.

I very much want to see the competition thrive, so I encourage Tableau to have contestants use the exact same data, reduce the pre-built stuff, make sure all entries are downloadable from Tableau Public, ensure the audience voting is transparent and fair, and establish a code of conduct to eliminate cheating.

Want to learn more about Iron Viz? Click here.

Thinking of participating, but afraid to try? Check out Sarah Bartlett’s Iron Quest initiative.

Feb 062019
 

February 6, 2019

My earlier post on why I don’t like Population Pyramids generated a lot of communication on twitter and this communication lead to what I think is an elegant and analytically solid offering from Chris Love.  His contribution was in the result of iterating and collaborating with several people, including Dorian Banutoiu and Daniel Zvinca.

I think what transpired over just a few hours shows the data visualization community at its finest.

Here’s a condensed view of what transpired.

First Issue – What I showed isn’t a Population Pyramid

Several people commented that a true Population Pyramid should show age (and only age) along the Y-axes and gender on the x-axis. My example revolved around gender and fields of study, but not age. Here is a typical population pyramid.

Figure 1 — Population Pyramid of China in 2017. Source: PopulationPyramid.Net

Second Issue – An assertion from others that Population Pyramids are useful when comparing more than one population

Both Chris Love and Bridget Cogley chimed in stating that while my criticism of a Population Pyramid is valid for a single group, Population Pyramids can be very useful when you have multiple populations to compare. Chris offered this small multiples example.

Figure 2 — Small multiples Population Pyramid from Chris Love.

Many people commented on this. I particularly liked this observation from Lindsey Betzendahl:

“I like this because I can quickly, at a high-level glance, get perspective on age by various countries and see outliers or variations. Like Kosovo. Clearly, it’s not to see exact values. The side by side with detailed set actions is a nice alternative.”

I had some back and forth with Chris as I was troubled that there was no way in his small multiple to see the gaps easily. He countered with a couple of other approaches, one of which is shown below.

Figure 3 — Another approach using bars. Blue indicates more males, gold more females.

I liked this quite a bit because I could compare shapes and see where there were gaps. In the meantime, Dorian Banutoiu proposed this hybrid approach where you could drill down on a particular small multiple and see its related gap chart.

Figure 4 — Hybrid example where we can see the big picture and drill down to see details using the gap chart. Notice that Romania is selected in the chart on the left.

This is when Daniel Zvinca got involved and he suggested using small multiples around a polygon chart, like the one shown below.

Figure 5 — Daniel Zvinca’s suggested approach.

This resonated with me and I shared as much with Dan and others:

“I was just thinking of doing the same thing — filling in the gap between the two lines and making that filled gap a different color based on which group is larger. I’d love to see how this would work with comparing shapes for many data pairs.”

And before I could post it, Dorian shared this:

Figure 6 — William Playfair’s famous timeline from 1786. Notice how the space between the two lines are filled with different colors to show the different gaps.

Here’s what happened next…

Figure 7 — A small portion of the twitter thread.

And fortunately for me, Chris Love took the bait! Here’s his wonderful rendering (click to see it in Tableau Public.)

Figure 8 — Chris Love’s population shapes chart showing gender gaps using an area chart (no polygons needed).

Observations from me and others

There were a lot of other people who added very useful commentary along the way. Jorge Cameos expressed concern that the professional demographers would protest about presenting age along the x-axis rather than the y-axes. Dan Zvinca said he would try to persuade the demographer to accept the horizontal layout. I decided to amplify Dan’s recommendation.

Figure 9 — I know some very persuasive people

Here’s a snippet of our discussion. Note in particular our agreement about the area charts for the small multiples and the gap chart for the details.

Figure 10 — A small piece of what was a very engaging discussion.

Care to see the entire discussion? Click here.

Conclusion

The few hours between when my tweet and Chris Love’s alternative approach showed the data visualization community at its best. As Chris put it, [amazing things] “can happen when you collaborate and challenge (or are challenged) on an established chart type like population pyramids.”

I’m very happy to have been the grain of sand that acted as the irritant in the oyster that produced this pearl.

 Posted by on February 6, 2019 1) General Discussions, Blog Tagged with: , ,  1 Response »
Feb 042019
 

February 4, 2019

A special thanks to Chris Love who provided an elegant way to build the dashboard I show at the end of this post.

Note: Make sure to read the follow-up article here. Some amazing stuff happened in just a couple of ours from when I first posted this article.

Background

I’ve seen two population pyramids over the last few weeks and thought that both examples made the audience work considerably harder than they needed to make comparisons. While both graphics were attractive, they both fell short in making it as easy as possible for people to “get it.”

In this article I’ll look at one of these graphics and show a better way to present the same data.

A Population Pyramid example

Here’s the graphic that showed up in my twitter feed a few weeks ago. It’s from the Global Gender Gap Index report from the World Economic Forum.

Figure 1 — Gender gap population pyramid from World Economic Forum.

So, what’s wrong with this chart?

Before digging in, let me render this using Tableau.

Figure 2 – Population pyramid rendered in Tableau, sorted alphabetically.

The goal of this type of chart is to make it easy to see the population distribution differences between men and women. But just how easy is that? Let’s see if you can answer these questions quickly.

  1. Which field had the overall largest percentage of people?
  2. Which field had the third largest?
  3. In which field is the gap between men and women the largest?
  4. In which field is the gap between men and women second largest?
  5. In which fields do the percentage of women outnumber men?

None of these questions are easy to answer with the population pyramid sorted alphabetically. If we sort the fields from highest overall percentage to lowest, we can answer the first two questions easily.

Figure 3 — Population pyramid sorted by most the least popular fields of study.

Now it’s easy to see that “Business, administration and Law” is first and “Engineering, Manufacturing, and Construction” is third.

But why are the other three questions more difficult? Sure, you can figure them out, but you have to work hard; they are far from an “instant” read.

The main reason is that while the bars have a common baseline, they extend in opposite directions. This means it’s easy to compare the green bars with the green bars, and the purple bars with the purple bars, but’s it’s very hard to compare a green bar with a purple bar.

Let me try to accentuate this by removing the percentages and just comparing the bars themselves.

Consider the example below where I want to compare how much larger the field of study for Education is compared with Engineering, Manufacturing and Construction, for just men.

Figure 4 — Focusing on one comparison

That looks like a very fast read. Even without the numbers present its easy to see that the second green bar is about twice as large as the first green bar.

Let’s try another comparison where we look at the difference between men and women for Health and Welfare.

Figure 5 — Comparing percentage of men vs. women in Health and Welfare.

I’m a professional chart looker-atter and I find this a difficult comparison. If, however, I move all the bars to one side it become a much easier comparison, as shown below.

Figure 6 — Comparing bar length from a common baseline.

Now that’s quick read! It’s readily apparent that the purple bar is around twice as large as the green bar.

But… this is a pretty cumbersome visualization.

Is there a better way to show this? A way that is easy to understand, compact, and attractive?

Of course!

How Pew Research handles this type of thing

I’m a big fan of the Pew Research Center and think their research and visualizations are exemplary.  Consider this graph that shows the percentage of democrats vs. republicans who believe it is essential for someone in high office to have certain personality traits.

Figure 7 — Pew Research poll comparing Democratic and Republican views.

Pew Research has crafted a very elegant connected dot plot (aka, “barbell chart”, aka “dumbbell chart”, aka “gap chart.”) Look how easy it is to see where the gaps are big and where the gaps are small, as well as just how large the percentages are for each personality trait.

Notice, too, how there is a thin line that guides the eye for each attribute, and that there is a thicker gray line that connect the dots for each trait, emphasizing the gap.

So, what does our “Fields of Study” chart look like when rendered using this type of chart?

Figure 8 — Fields of Study poll rendered as a connected dot plot.

Look how easy it is to answer the five questions we posed earlier! The gaps really pop out, but we can also make all the other comparisons very easily.

Conclusion

While population pyramids are attractive and compact, they make it difficult to compare across the diverging categories. I think you’ll do better with a connected dot plot.

Note: Make sure to read the follow-up article here. Some amazing stuff happened in just a couple of ours from when I first posted this article.

Jan 212019
 

Some thoughts on why pursuing the perfect chart is not a great use of your time

January 22, 2019

Background

During Chart Chat (Round Two) I discussed my affection for the lollipop chart and why I thought it was an acceptable alternative to the bar chart. My friend and colleague, Chris Love, was watching the event live was so inspired / incensed he wrote this blog post.

That blog post in turn led to the longest twitter thread ever (I will not subject you to it). Some great ideas from some terrific people, but I couldn’t help thinking that we were engaging in a little bit of intellectual self-gratification. After reviewing the discussion I couldn’t help wondering if we should save our energy for visualization battles that are worth fighting (more on this in a moment).

Before I go further, I realize I should have provided the background for my support of the lollipop chart. To be clear, I was not suggesting that chart on the right was “better” than the chart on the left.

Figure 1 — Bar chart vs. Lollipop chart.

My advocating its use was in trying a client from abandoning bar charts in favor of this, er, interesting dashboard.

Figure 2 — A “screaming cat” dashboard where the appeal of the visuals compromises the analysis.

My point here—and in the “Succeeding in the Real World” section from The Big Book of Dashboards—is that if your client or stakeholders pine for “cool”-looking charts because they are, well, cool, you may be able to use a lollipop chart to placate their “that looks cool” needs while not compromising the analytic integrity of the dashboard.

Fighting the battles that are worth fighting

I’ve written previously that there is no perfect chart nor perfect dashboard, and trying to find the perfect solution is not a great use of your time. Let me give an example of this using a real-world data set.

Here’s an example from my Building world-class business dashboards workshop where I ask attendees to visualize this data set:

Figure 3 — How would you visualize this?

The catalyst for this exercise was this graphic from the Wall Street Journal.

Figure 4 — Paired bar chart from the Wall Street Journal

I’m not a big fan of paired bar charts as the require a lot of ink and take up a lot of space (and while I very much like Jamaica, I’m not a fan of the colors in this context). But, if the client were dead-set on using this chart I think it is “good enough” as it makes it easy to compare the companies to themselves and with each other. 

Here  are some alternative approaches from workshop attendees.

Here’s the data rendered as a gap chart.

Figure 5 — Same data rendered as a gap chart (aka, “connected dot plot”, aka “barbell chart”)

Below please find the data as a bar-in-bar chart. I’m not a big fan of this, but it’s certainly “analytically valid” as you can compare the length of bars from a common baseline. I’ve heard some arguments that the area of the wider bars may suggest more “heft” than exists, so I tend to avoid this approach (but I do think it is “good enough.”)

Figure 6 — Bar-in-bar chart

Here’s the same data rendered as a line chart. This is certainly easy to understand and isn’t weighted down by lots of ink.

Figure 7 — Line chart

There are at least three other ways to display the data that I would consider analytically sound. A special “shoutout” to those workshop attendees who went beyond the scope of the original exercise and plotted “other” as a third brand, as in this slope graph.

Figure 8 — Slope graph showing three brands.

So, I would say that all of these “work” and that they are all “good enough.”

Here’s an example that I don’t think is good enough.

Figure 9 — Divergent bar char (this one is NOT good enough.)

While it’s easy to compare a green bar with the other green bars and a yellow bar with the other yellow bars it’s very hard to compare a yellow bar with a green bar because those bars don’t have a common baseline with bars going in the same direction.  This is the reason I don’t like population pyramids.

Figure 10 — Population pyramid.

Pop quiz: Looking at the chart above, in which fields of study is the gap between men and women greatest? Where is the next largest gap?  This is very hard to tell with this type of chart, but very easy with a gap chart (blog post on this coming soon).

Going to the mat

I would fight hard to convince my stakeholders not to adopt the divergent bar / population pyramid approach as it only does one thing well (comparing the yellows with the yellows and the greens with the greens) while the other charts allow both intra and inter-company comparisons. And while I would dissuade my clients from using the paired bar / bar-in-bar approach I wouldn’t “go to the mat” about it.

But how do we decide among all of the “good enough” approaches?

I’m going to save that discussion for another blog post but will say that it involves iteration and getting your stakeholders involved in the design process.

And that process will lead to both better visualizations and greater adoption by your stakeholders.

Jan 032019
 

Overview

A number of people have asked me how to use Tableau to visualize survey data that was captured using Google Forms. I decided to try this for myself and built a pre-course survey where I asked folks that were going to attend one of my classes about their experience with survey tools and Tableau. A snippet of that survey is shown below.

Figure 1 — A piece of a survey built using Google Forms.

The good news is that with Google Forms you can get the data mostly “just so.” The bad news is that the data is only recorded as text responses and that check-all-that-apply questions are not placed in separate columns, but are instead smashed together in a single column with a semi-colon as a delimiter.

In this blog post we will explore how to convert the check-all-that-apply responses into separate columns that have 1 for respondents that selected an item and 0 for those that did not.  Note that this technique will work for any tool that doesn’t place the responses in separate columns.

Inspecting the data

Note that the survey I constructed only had a few questions so I’m not bothering to create a “helper” file and I’m not wringing my hands that I can only get the data as text responses instead of both text and numerical responses. (If you’re not sure what those things are, see this blog post.)

Here’s the CSV file I downloaded from Google Forms. I’ve highlighted the check-all-that-apply responses.

Figure 2 — The CSV file from Google Forms.

Notice that we can’t count on certain selections being in the same place within the cell. For example, in row two, the first tool selected is “Qualtrics” while in rows 3 through 5 the first is SurveyMonkey.

Figure 3 — We can’t count on the there being a uniform number of elements or that the first element will always be the same.

So, how can we take that “mess” in column D and create separate columns for the eight possible options?

It turns out this is easy to do with Tableau Prep.

Don’t know how to use Tableau Prep for survey data? Click here.

Converting the single columns into many columns

Here’s what the data looks like inside Tableau Prep

Figure 4 — Our Google Forms data inside Tableau Prep.

We need to create several new calculated field based on the field [What survey tool(s) do you use]. To do this, we right-click that field and select Create Calculated Field from the context menu.

Figure 5 — Creating a new field based on another field.

Below we see a new field called [_Qualtrics] that is defined as follows.

Figure 6 — Creating a new field that counts how many people selected Qualtrics.

Here’s the English translation:

If the field [What survey tool(s) do you use?] contains the 
word “Qualtrics”, put a 1 in the new field called [_Qualtrics];
otherwise, put a 0.

And guess what? We get to do this again for each of the eight different options for the check-all-that-apply question. Here’s what the [_SurveyMonkey] field looks like:

Figure 7 — Creating a new field that counts how many people selected SurveyMonkey.

Here’s what the step looks like with the 8 new calculated fields.

Figure 8 — Creating eight check-all-that-apply fields from Google Forms

Note: I’m using the underscore in the field name to make it easier to find and group all the check-all-that-apply questions.

Reshaping and outputting the data

Now that we’ve converted that one column into eight columns, we need to take all the non-demographic questions and reshape them so that we have two columns (one named Questions and the other Answers).

Figure 9 — Reshaping (pivoting) the data.

We leave the Timestamp and Username intact (1) and pivot everything else, renaming Pivot field names as Questions (2) and the Pivot field values as Answers (3).

We can now output this to a .hyper file and visualizing our check-all-that-apply question in Tableau.

Figure 10 — Visualizing a Google Forms check-all-that-apply question in Tableau.

Want the source data as well as the Tableau Prep flow file? Click here to download the packaged Tableau Prep file.

Dec 022018
 

And a great way to visualize Pareto analysis

December 2, 2018

Special thanks to Lindsey Poulter, Andy Cotgreave, Jeffrey Shaffer, Cole Nussbaumer Knaflic, Robert Kosara, Adam Crahen, and Joey Cherdarchuk.

Overview

While at this year’s Tapestry conference (by far my favorite conference) I had the good fortune to literally bump into Joey Cherdarchuk from Darkhorse Analytics. Holy crap! Joey is responsible for this brilliant and charming redesign of a pie chart.  I encourage you to step through “Salvaging the Pie” now.

Delighted to have met the author of one of my favorite presentations, and aware that Tapestry was looking to fill some mini-presentation slots for the second day of the event, I excitedly shared Joey and his work with Cole Nussbaumer Knaflic, Robert Kosara, and Adam Crahen.  As is the case whenever opinionated data visualization enthusiasts gather, a discussion transpired whether the “sleight of hand” magic was a fair makeover as Robert, Cole, and Adam pointed out that a bar chart allows for accurate comparisons among different members of a category but a pie chart, when done right (which is rare) is supposed to show a part-to-whole relationship. You’re not supposed to use a pie chart to compare slices.

In any case, this discussion got me thinking…

Where a Pie Chart Works

Please note the singular; “pie chart”, not “pie charts.”

There are some cases where a pie chart works well, particularly in tandem with a bar chart.  Consider Jeffrey Shaffer’s pie chart makeover from The Big Book of Dashboards.

Here’s the original monstrosity.

Figure 1 — A “screaming cat” of a pie chart.

And here’s Jeff’s makeover.

Figure 2 — Jeff Shaffer combining a simplified pie with a bar chart a clear story that expresses to views, comparison and part-to-whole.

We’ve got the best of both worlds here.  We can make an accurate comparison of all 17 categories and we can see how big the largest slice is compared with the whole.

But what if someone wants to look at something besides the top category, or what if they want to look at two or three categories combined?

Why not just highlight more slices? The problem is the pie charts are only easy to understand if the slice or slices start at 0%, 90%, 180% or 270%, with starting at 0% and moving counterclockwise being “best practice.”

Consider the three pies below that Andy Cotgreave created based on Stephen Kosslyn’s work from Graph Design for the Mind and Eye (Oxford Press, 2006.) What percentage of each circle is shaded blue?

Figure 3 — Three pies. Guessing the percentage of the blue segment is easy in the first example but difficult in the other two.

Most people get the first example without having to think hard. It’s 90-degrees / 25%.

Now look at the second pie chart. This one is harder because we’re not starting at 0%. It’s still a 90-degree / 25% slice but it’s much harder to guess the size.

Change the degrees / percentage to some random value and most people are lost. They just can’t accurately guess that percentage at all.

So, how can we create something so that whatever we select becomes a slice whose size we can easily interpret?

Time for some interactivity and Tableau set actions.

Bars, Pie Slices, and Set Actions

Here’s a version of the Pig Meat Preferences visualization with nothing selected.

Figure 4 — Pig Meat Preferences with nothing selected

Let’s see what happen when we select Bacon.

Figure 5 — Dashboard with first item selected. Notice that the selection is reflected in the pie as well.

Now let’s see what happens when we select “Ham” and “Other.”

Figure 6 — Dashboard with two items selected.  Notice that the red slice in the pie reflects what has been selected.

We’re using Tableau Set Actions to color the slice red by what is in the set (whatever is selected, in this case “Ham” and “Other”) and gray by what is not in the set (Bacon and Ribs.) The pie is “sorted” so that the items in the set start at 0 degrees and move counterclockwise.

Pareto Analysis

We can also apply this technique to Pareto analysis. Consider the example below where we see the sales from 49 different states in the US.  There’s a total of 49 states and the top states dominate.

Figure 7 — Sales by state.  No pie chart, yet.

Will the 80/20 rule apply here?  That is, will 20% of the states (around 10) be responsible for 80% of the sales?

Figure 8 — The top 10 states contribute to 71% of the sales.

Not quite, but pretty close. The top ten states contribute to just under 3/4 of the sales and the pie chart makes it easy to see that we’re not quite at 75%. Incidentally, you can select any combination of bars and see the part-to-the-whole represented in the pie, as shown here.

Figure 9 — You can see the part-to-whole relationship for any selection.

Conclusion

Pie charts can, when used in conjunction with bar charts, provide a part-to-whole perspective that the bar charts by themselves cannot.

Want to learn more about Tableau set actions? This post from Bethany Lyons contains many examples, including specific instructions on how to use set actions create a part-to-whole views.

You can experiment with both examples below.

 Posted by on December 2, 2018 Blog 3 Responses »
Nov 282018
 

November 28, 2018

Thank you to everyone that joined our first Chart Chat Live webinar.

If you were not able to join us for our first episode of Chart Chat Live, then check out the video on YouTube. You can also download Steve’s slides here and download Jeff’s slides here.

Recap

After brief introductions, Steve discussed color and the use of BANs. Steve discussed “Lonely Numbers” as discussed in the new book, Factfulness: Ten Reasons We’re Wrong About the World–and Why Things Are Better Than You Think by Hans Rosling.

Figure 1 — Factfulness by Hans Rosling

He enjoyed the book so much, he’s read it twice!

Steve also added BANs to the Churn dashboard from Chapter 24. This updated dashboard can be downloaded from www.BigBookOfDashboards.com/dashboards.html.

Steve also encouraged viewers to review Adam McCann’s blog post on 20 ways to visualize KPIs.

Jeff discussed some examples of “Comparing Individual Performance with Peers” from Chapter 3 of The Big Book of Dashboards. This dashboard is available for download here.

Steve discussed “visualization we shouldn’t like, but do” and showed Jeff’s Name Dropping visualization and redesign with the “fun removed”.

Steve also showed a beautifully-designed health care visualization by Katie McCurdy from Pictal Health.

Figure 2 — Patient history dashboard by Katie McCurdy.

Jeff closed the webinar with a discussion about “The Screaming Cat” and discussed various blog posts related to radar charts, dual-metric donut charts, and the various responses from within the data visualization community.

Use radar charts to compare dimensions over several metrics by Jonathan Trajkovic

Dual-Metric Donut Chart Tutorial by Toan Hoang

Radial Bar Chart Tutorial by Toan Hoang

Jeff discussed how these tutorials can be useful in real-world cases, even though they may not be best practices in other cases.

The highlight of this section was Jeff telling the story of the famous Florence Nightingale Rose Chart, how it was tremendously effective in delivering a very important message at the time. Jeff also took on the daunting task of redesigning this very famous visualization.

Figure 3 — Original “rose” graphic

Figure 4 — Jeff’s redesign.

[Note from Steve: Jeff’s redesign is terrific.]

Additional References

Better than Jitter Plot by Steve Wexler

In the pursuit of diversity in data visualization. Jittering data to access details by Danieal Zvinca

Speaking of Graphics (2006), An Essay on Graphicacy in Science, Technology and Business by Paul J. Lewi, Chapter 5, Florence Nightingale and Polar Area Diagrams

Florence Nightingale’s Hockey Stick, The Real Message of her Rose Diagram, by Hugh Small

Next Installment

Be sure to register here for the next installment of Chart Chat on January 8th, 2019 at 11am.

We hope you find this information useful.

Jeff and Steve

www.BigBookOfDashboards.com

 

Follow Jeff on Twitter at @HighVizAbility

Follow Steve on Twitter at @vizBizWiz and sign up for his newsletter here.

 

 Posted by on November 28, 2018 1) General Discussions, 5) Business Visualizations, Blog Tagged with:  No Responses »
Sep 122018
 

September 12, 2018

Overview

I’ve recently seen some questionable approaches to visualizing check-all-that-apply questions (CATA questions) in Tableau. I’m concerned because many of these approaches won’t work if either you filter the data to only show some options, or if some of your survey participants skip the CATA questions.

It’s the latter reason that really gets me as I rarely see a case where everyone who participates in a survey answers every question. If you count everyone that participated in a survey, and not just the folks that answered the CATA questions, your calculations will be off (and they could be off by A LOT.)

In this blog post I’ll show a simple technique for displaying CATA percentages as well as how to make a minor modification so the technique will work with weighted survey data, too.

Getting your data setup correctly

Before going any further, you’ll need to make sure your data is tall and not wide. That is, instead of the data looking like this…

Figure 1 — Wide survey data. Check-all-that-apply questions are highlighted in yellow.

… the data should look like this:

Figure 2 — A snippet of the data showing some responses for Resp ID 2. Note that we can see that this respondent selected “Yes” (1) for Q2_9, Q2_3 and Q2_5.

If this is all new to you, please see Getting Survey Data “Just So.”

I also want to make sure the data is coded as 1s and 0s (1=Yes, 0=No, Blank=Didn’t answer). If your data is not setup this way, make sure to check out Dealing with Survey Tools that Don’t Code Check-all-that-apply Questions Correctly or Using Tableau Prep to Fix Problems with Check-All-That-Apply Questions.

Creating the Check-all-that-apply percentage calculation

Consider the screenshot below where we have a filter that allows us to only focus on the CATA questions in our survey data. I’ve added Question ID and Wording to the Rows shelf.

Figure 3 — Just looking at the group of related CATA questions.

Now we just need to fashion a calculation that will show us the percentage of people that selected one of the nine items; that is, we need to know how many people have a “1” associated with Q2_1, Q2_2, etc.

Here’s the calculation that will do the trick for us.

Figure 4 — How to determine the percentage of people that selected an option.

A quick translation into English would be:

Add up everybody that answered “1” (meaning they selected an option) and divide by the number of people that answered the question.

Let’s see if this works. We can change the default numeric format to percentage, drag the field onto columns, and then sort in descending order.

Figure 5 — Completed check-all-that-apply arrangement in Tableau.

Well, just look at that! We didn’t need COUNTD(), table calcs, LoD expressions, etc., and we can filter this to our hearts content and everything will work perfectly.

Also, if we want to also know the number of responses, we can use this calculation…

Figure 6 — String calculation to show count of people that selected an item and the total number of people that responded to the question group.

… and get something that looks a bit fancier.

Figure 7 — Percentage check-all-that-apply with response count.

But “weight”… there’s more!

In addition to this being a really easy calculation that will work when you filter, add additional dimensions, etc., we can modify it so that it works great with weighted data.

And just what do I mean by “weighted” data? We use weighting to adjust the results of a study so that the results better reflect what is known about the population. For example, if the subscribers to your magazine are 60% female but the people that take your survey are only 45% female you should weigh the responses from females more heavily than males.

You may recall that our data has an additional weighting variable in it called Q0_Weight (see Figures 1 and 2).  Here’s what the weighted percentage check-all-that-apply calculated field looks like.

Figure 8 — Percent check-all-that-apply for weighted data.

Conclusion

If your data is setup correctly, it’s easy to visualize check-all-that-apply questions with a simple calculated field that will work with both regular and weighted data.

Related posts

Dealing with Survey Tools that Don’t Code Check-all-that-apply Questions Correctly

Working with Weighted Survey Data

 

 

 Posted by on September 12, 2018 2) Visualizing Survey Data, Blog Tagged with: , ,  7 Responses »
Aug 202018
 

Visual representation of error bars using Tableau

August 20, 2018

Much thanks to Ben Jones whose book Communicating with Tableau provides the blueprint for the calculations I use, Jeffrey Shaffer for providing feedback on my prototypes and sharing research papers from Sönning, Cleveland, and McGill, and Daniel Zvinca for his thoughtful and always invaluable feedback on the article.

Introduction

You’ve just fashioned a dashboard showing the percentages of people that responded to a check-all-that-apply question and your quite pleased with how clear and functional it is. You even have an on-demand parameter that will allow you to break down the results by gender, generation, and so on.

In your preliminary write-up you note that measuring adrenaline production is ranked highest with 77% and Metabolism is second with 72%. Before presenting your findings, you ask one of your colleagues to review your work and she asks you “what is the margin of error for these results?”

Not sure what she means you ask her to clarify. “If you were to conduct this survey again with a similarly-sized group of people, how confident are you that you would get the same result?”

Clueless on how to determine this, your colleague explains some very useful statistical methods built around the mean limit theorem and how these simple formulas can help you state a range of values for the survey results.

You thank her profusely, do a little research, then come back and tell her that, with a confidence interval of 95%, the range of values for Adrenaline Production is 72% to 82% and the range for Metabolism is 67% to 77%.

She applauds your work but asks you if there’s a compelling way to show these ranges and not just present numbers. (She also asks if you can change the confidence interval to 99% because folks like to know both 95% and 99% confidence intervals.)

In this post we’ll look at

 

 

  • How to show confidence intervals / error bars
  • How to build the visualization in Tableau
  • Some alternative visualization approaches
  • How to deal with a low number of responses

Displaying actual and error bar ranges

Rather than have you wait until the end of the blog post to interact with the dashboard I’ll, present it here.

Use the “Show” parameter to switch among four views.

 

What’s behind the scenes in Tableau

Note: the following fields are based on Chapter Seven from Ben Jones’ book Communicating Data with Tableau. It helped me get to where I needed to be very quickly, and I recommend it highly.

For a more thorough explanation of the central limit theorem, polling, and why all these formulas work I recommend Naked Statistics by Charles Wheelan and Statistics Unplugged by Sally Caldwell.

Here are the fields we’re going to need to fashion the visualization. Yes, there are a lot of them but many are variations on the same theme, so don’t be put off.

FieldDefinitionComments
CATA (Check all that apply)SUM([Value]) / SUM([Number of Records)Determines the percentage of people selecting an item where [Value] equals 1 when selected and 0 when not selected.

Note that (1-CATA) will be the percentage of people who did not select the item. Both this and CATA are used in the Standard Error formula below.
CATA_StandardErrorSQRT(([CATA]*(1-[CATA])) / SUM([Number of Records])) We’re using the “calculation of Confidence Intervals for proportions” methodology. See http://davidmlane.com/hyperstat/B9168.html

Z upper 95%1.959964Multiplier to make sure we are within 1.96 standard deviations or 95% of possible responses
Z upper 99%2.575829Multiplier to make sure we are within 2.58 standard deviations or 99% of possible responses
CATA_Margin of Error 95%[CATA_Standard Error] * [z upper 95%]
CATA_Margin of Error 99%[CATA_Standard Error] * [z upper 99%]
CATA_Lower Limit 95%[CATA] - [CATA_Margin of Error 95%]Lower limit for 95% error bar
CATA_Upper Limit 95%[CATA] + [CATA_Margin of Error 95%]Upper limit for 95% error bar
CATA_Lower Limit 99%[CATA] - [CATA_Margin of Error 99%]Lower limit for 99% error bar
CATA_Upper Limit 99%[CATA] + [CATA_Margin of Error 99%]Upper limit for 99% error bar

Understanding the chart

Here is the pill arrangement for a simplified view where we see the actual value and the 95% error bars (you can download the workbook to see how the 95% and 99% bars are combined and how the parameter controls which, if any error bars, get displayed).

Figure 1 — How the combination circle chart / line chart showing error bars is built.  Given the peculiarities of this data set and the response size the Confidence Intervals are similar but not identical for the options respondents selected.

On columns we combine (1) AGG(CATA) as a blue circle and (2) Measure Values into a dual axis chart.  You may recall that the field [CATA] determines the percentage of people that selected an item in a check-all-that-apply question.

Measure Values refer to [CATA_Lower limit 95%] and [CATA_Upper limit 95%] (3).

Note that these two measures are displayed as a line chart (4) and that Measure Names is on the path (5). This tells Tableau to draw a line from the lower part of the error bar to the upper part of the error bar. The dual axis chart is synchronized so the blue dot is centered within the line.

Some alternative views that didn’t make the cut

The circle with dual error bars came after several iterations and some feedback from Jeff Shaffer (more on the feedback in a moment).

I started first with a dot and Tableau reference lines connected by a line.

Figure 2 — A dot and a line connecting reference lines. This is easy to build but gives too much attention to the vertical error bars.

This is very easy to build in Tableau, but I found the height of the reference bars distracting. Indeed, this led to the creation (and abandonment) of the Imperial TIE Fighter chart.

Figure 3 — Showing error bars with Imperial TIE Fighters. This may work if you’re creating a visualization about Star Wars.  Note that to present this accurately the length of the wings would have to expand and contract based on the Confidence Interval.

Here’s another version where the ends of the error bars are shorter.

Figure 4 — Dot with less pronounced error bars.

This is a combination line chart and shape chart where the whiskers at the end are in fact shapes (as is the dot) and the line simply connects the shapes.

I call this next one a “Fingernail” chart.

Figure 5 — “Fingernail” chart where “fingers” show survey responses and the “nails” show the range of values within a 95% confidence interval.

The bars or “fingers” show the survey responses and the “nails” show the range of values within a 95% confidence interval. I’ll confess that the thing I liked most about this chart was the name I gave it.

Here’s a chart that uses gradients.

Figure 6 — Gradient bars.

The gradient bars passed the “looks cool” test but not so much the “more useful than distracting test.”

Figure 7 — Dot with Gantt error bars.  With red dots this would look like the flag of Japan.

This one was a strong contender but “lost” after getting feedback and reading some research on visual cognition (see below).

Why the chart that “won” won

Anyone who has attended my workshops know that I’ve become a big proponent of iterate, feedback, iterate, feedback, etc. I don’t publish anything that is reasonably “high stakes” without having colleagues review it and the feedback I get always results in more insightful visualizations.

Jeff Shaffer and Andy Cotgreave have been great collaborators, and both gave me very good feedback on my approaches.  Jeff also encouraged me to read William Cleveland and Robert McGill’s seminal research on graphical perception and especially Lukas Sönning’s paper on the dot plot.

It was Sönning’s advocating for salience (the notion that prominent elements should receive more attention) along with the ease with which we can show the actual values and more than one confidence interval at the same time that won me over.  With the other approaches showing the survey percentages along with both 95% and 99% confidence interval error bars became unwieldy.

What happens when response count is low

You may not really appreciate just how useless survey results are with low (n) counts until you see how wide the error bars become when the number of respondents is below a certain threshold. Consider the interactive dashboard below where we break down the results by Generation.

Notice the results for Traditionalists for Adrenaline Production (fourth row.) Only 14 respondents fall into this group so the 57% of that group that selected this item… it’s plus or minus 25.9!

This is why I advocate having a mechanism to either display a polite error message or simply remove findings when the response count is so low as to make the results meaningless.

Note that you can find lots of discussion about just what the criteria should be for removing questionable results (e.g., n<30, np <5, np(1−p)<10, etc.)  All of these can be programmed into Tableau easily so just keep in mind that as your n gets smaller the error margin gets larger.

Versatile approach

While the example I use for this article uses check-all-that-apply questions the dot with bi-modal error bars will work for Likert data as well.  The packaged workbook embedded in the dashboards on this page also contains a visualization that shows percentage of people that selected the top 2 boxes for Likert scale data.

Figure 8 — Error bars with Likert data.  Notice that the response count is higher in this example so that error bars are considerably narrower.

There are also formulas and a simple visualization for single-punch questions (radio-button questions where respondents can only select one item):

Figure 9 — Actual results and 95% error bars for a single-punch question. The embedded workbooks contains this example and the calculations you’ll need.

Conclusion

I’ve made my case for the dot and the bi-modal error bars for showing response percentages and confidence intervals. Whether you use this or one of the other approaches is up to you but please do yourself and your stakeholders a favor and make sure they realize that there is always a margin of error with survey results.

Jul 222018
 

July 22, 2018

Overview

After writing my blog post about why I still love the Jitterplot, I got some thoughtful feedback from Adam McCann, Jeffrey Shaffer, and the always provocative Daniel Zvinca.

Dan had written a wonderful article suggesting a Stingray plot as an alternative, and Jeff and Adam’s comments encouraged me to revisit the unit histogram featured in Chapter 3 of the Big Book of Dashboards.

A makeover using a Unit Histogram chart

As you may recall, the goal of the Jitterplot is to allow an individual to see his / her place in the salary “universe” as well as get a good sense of how many people fall into different categories and how the different salaries cluster. With these things in mind, here’s the makeover using a Unit Histogram (also known as a statistical dot plot).  Note that you can find an interactive version at the end of this post.

Figure 1 — Unit Histogram showing individual salary, quartiles, and distribution.

For comparison, here’s the Jitterplot version.

Figure 2 — Jitterplot showing individual salary and quartiles.

What’s the downsize of the Unit Histogram?

To create the equivalent of histogram bars, each of the dots must fall into one of several equally-spaced bins. The example in Figure 1 above has bins spaced every $2,500.  This means a dot representing a value of $60,375 will have the same vertical location as a dot representing $62,290; they will both get placed in the $60,000 bin. In the jitterplot none of the dots have to be displaced vertically.

How to get the bins and the reference bands

If you use Tableau’s built-in binning feature you won’t be able to add reference lines or reference bands, even if you convert the Tableau-created bins from discrete to continuous.

You can, however, roll your own continuous bin using a technique Joe Mako suggested in this forum post from 2013. Here’s a look under the hood.

Figure 3 — What drives the unit histogram.

How it works

Continuous Bin (1) is on the Columns shelf and creates the axis along the left side. It’s defined as

INT( [Value] / [Bin Size]) * [Bin Size]

Where [Value] is the salary of a respondent and [Bin Size] is a parameter controlling the size of the bin.

Notice that there’s also a discrete version of this on the Level of Detail (2). This will allow Tableau to divide the visualization into discrete junks. We’ll need this so we can pile the dots on top of each other. Note that I acknowledge that having a field called Continuous Bin (discrete) is an oxymoron.

Resp ID (3) is also on Level of Detail. This will force Tableau to draw a separate dot for each Resp ID.

INDEX() (4) is Tableau’s built-in Index function. It essentially answers the question “what row within the partition can I find this dot?”

The key thing is how we setup the addressing and partitioning so that Tableau will pile the dots within each bin. Here’s how Compute Using needs to be setup in the Tableau Calculation dialog box.

Figure 4 — Defining Compute Using.

The critical elements are highlighted in orange. By selecting Resp ID we’re telling Tableau “within the current partition (which is the discrete salary bin) determine what row a dot is in, then place the next dot one higher, then the next, until you are done with all the dots that are within this partition / bin.  When you get to a new partition / bin, start the process over again.”

Other approaches

Move the box plot / distribution bands out of the way

Jeff Shaffer provided feedback and one of his biggest gripes about box plots and quartile bands in general is that they obscure the marks underneath.  Here are two alternatives Jeff made that address this.

Figure 5 — Jeff Shaffer’s makeover where he places a box plot to the left of each Unit Histogram.

Figure 6 — Jeff Shaffer’s second makeover where the box plot covers half of the Unit Histogram.

I admit that I find it easier to explore the dots and see the distribution when the box plot does not occlude the unit histogram. My two problems with Jeff’s alternatives are that the charts take up more screen real estate and there’s a fair amount of extra work to build the visualizations (you can read about how to do that here.)

My compromise was to add a toggle that allows people to turn the distribution bands on and off at will. (See the dashboard at the end of this post.)

Why not a regular histogram with a dot?

Adam McCann sent me some alternative views one of which was like the chart shown below.

Figure 7 — Dot with simple histogram

Indeed, seeing this makes me think why bother to have all the little dots?  Why not just the “my salary” dot compared with a simple histogram?

There are a few reasons why I prefer both unit histogram (and the Jitterplot):

  • The standard histogram above lacks the visceral impact of the Unit Histogram and Jitterplot. I admit, that is a completely subjective stance.
  • The unit histogram allows me to inspect individual dots, as in “what’s associated with that dot way down there? I’d like to know more about that dot!”
  • We can resize the unit histogram dots based on some other measure (e.g., years with company.)

Note that none of this works if you have tens of thousands of dots. Indeed, at that point I would just use a histogram and show my audience which histogram pertains to them (see Are you over the hill in the USA for an example.)

Dan Zvinca’s KDE Piled Dot Plot and Stingray Plot

As I mentioned at the beginning of this article, one of the catalysts for me reconsidering the jitterplot was Dan’s article about the KDE Piled Dot Plots / Stingray Plots.

Figure 8 — KDE Piled Dot Plot encoding 2400 elements

Figure 9 — Vertical Stingray Plot with colors given by the categories and overall quartiles.

I like Dan’s approach a great deal in that it takes the smooth-curve distribution of a violin plot and fills it with granular details that can be highlighted and inspected. Dan is also applying strong statistical algorithms to determine the best curve versus the “pick a bin size and see if it works” approach in my dashboard.

One downside of the Dan’s approach is that like the Unit Histogram, the values must be altered so they fit within the curve (this is not the case with the Jitterplot.) The other downside is that I’ve not yet figured out how to build this in Tableau, and I don’t think it will be simple, at least for me. The Jitterplot and Unit Histograms are relatively easy to render in Tableau and do not require and special data preparation.

Conclusion

I’ve had success getting people to use dashboards by doing whatever I can to “insert” the audience into the dashboard itself. Until recently my favored approach was the Jitterplot, but I think I will at least explore using the Unit Histogram and, if it proves relatively easy to render in Tableau, the KDE Piled Dot Plot.

Note: when you hover over a dot you can see both its binned value and the actual value. The quartiles are derived based on binned values but one can modify this so that they are based on actual values. The difference is minor.