Dec 092015
 

Overview

In an October 2015 meeting of the Tableau Zen Masters each Zen Master was asked to name his / her favorite thing about Tableau.  Most people started by saying “well, besides the community, my favorite thing is…”

At the time I said “undo”.  Don’t laugh.  Let’s not take it for granted that with Tableau we can try things, fail, and go back to where we were before we failed… gracefully.

After the meeting I thought more about this question and how there are unheralded aspects of the tool and the ecosystem that I count on but that don’t get the attention they deserve.  I realized that there is one thing in particular that I use almost every day and without it I, and scores of others, would be at a major loss.

Tableau Public.

What is Tableau Public

Tableau Public consists of a free downloadable version of Tableau desktop to explore and visualize data, and a free cloud platform to host, share and embed interactive visualizations.

While I use my regular copy of Tableau desktop to explore data and fashion visualizations, it’s the free hosted version of Tableau Server (the cloud platform) that allows me to do so much and to learn so much.

Sharing

My website consists mostly of examples and blog posts and most of those posts contain Tableau dashboards that are embedded right inside the post.  I don’t have to ask people to download a reader and then download my dashboards.  The dashboards are right there.

Learning

But it gets better. Anyone with a copy of Tableau who is curious about how the viz works can just download the workbook, open it up, and see how it’s put together.

For example, a couple of years ago I wanted to see if anyone had recreated Hans Rosling’s famous Gap Minder demo in Tableau.  A quick Google search lead me to this example from Jeffrey Shaffer.

Figure 1 -- Jeffrey Shaffer recreates Rosling's Gap Minder.  See http://public.tableau.com/profile/jeffs8297#!/vizhome/shared/FQWYZ95DJ

Figure 1 — Jeffrey Shaffer recreates Rosling’s Gap Minder.  Click here.

I was curious to see how Jeff had gotten the year to display as big block letters in the middle of the viz, so I downloaded the workbook and “looked under the hood.”

Over the years I’ve downloaded hundreds of workbooks and have analyzed the individual brush strokes of Kelly Martin, Andy Cotgreave, Ben Jones, Anya A’Hearn, and dozens of others.  That I can do this so easily  is nothing short of amazing.  I don’t just get to interact with cool vizzes; I can download them and see how they work.

Adding to the conversation

Because the vast majority of Tableau Public authors do make their work downloadable, you’ll see people modify and repost the work they’ve downloaded. Andy Cotgreave wrote about this in late 2014 where one way of visualizing something begat other ways of visualizing the same data.  Here’s a particularly beautiful example from Michal Mixon.

Figure 2 -- A stunning dashboard from Michael Mixon that you can download from here.

Figure 2 — A stunning dashboard from Michael Mixon that you can download from here.

It Isn’t Perfect

Before you accuse me of writing a hagiography, there are several shortcomings with Tableau Public.

Row limit

You can only have 100,000 rowsNo, now you can only have 1,000,000 rows.  Okay, as of May of 2015 you can have 10 million rows of data.  Not really a shortcoming.

Locking down your data

Your workbooks and the underlying data can be downloaded and examined.  As of May of 2015 you can “lock down” your workbooks and your data.  This is huge as it allows public-serving organizations with proprietary data to publish interactive workbooks without fear that somebody will download and examine the underlying data.

That said…

Unless you indeed have proprietary data please, please, please don’t stop your workbooks from being downloaded.

Figure 3 -- This setting is on by default.  Please don’t turn it off unless you have a good reason (e.g., the underlying data is proprietary).

Figure 3 — This setting is on by default.  Please don’t turn it off unless you have a good reason (e.g., the underlying data is proprietary).

Tableau and the community that supports it are giving you this amazing free platform for you to showcase your work.  Please allow others to benefit by making your work downloadable.

Just because something is anointed a Tableau Public “Viz of the Day” does not mean it is a good viz

Many Viz of the Day selections are examples you should emulate, but I’ve seen some really bad vizzes that made the cut because the subject matter was “discussion-worthy”.  The problem is that those who are new to data visualization won’t know that these vizzes aren’t worth emulating.  They’ll just think “hmm, that viz with the word cloud and donut charts was a ‘viz of the day.’  I guess it’s a good thing to make vizzes with word clouds and donut charts.”

As Tableau comes out with new versions they tinker and break things

I understand that Tableau Public is an evolving platform and that Tableau wants to improve that platform by adding new features. The problem is that I’ve had many dashboards that suddenly stops working. Tableau has been great at responding to notices that things are broken, but sometimes it can take hours, if not days, to fix. So…

Favor: Please add my website to your test suite. In other works, do not implement any new features until you’re sure everything on my site works perfectly.

Okay, okay, okay… but it doesn’t hurt to ask.

Things I cannot do without

Here’s a summary of the things I cannot do without in my practice.

I would not be able to do what I do without Tableau Desktop.

I would not be able to do what I do without the community that supports Tableau.

And I would not be able to do what I do without Tableau Public.

My sincere thanks to the Tableau Public team.

 Posted by on December 9, 2015 1) General Discussions, Blog Tagged with:  4 Responses »
Nov 222015
 

Overview

I received an e-mail inquiry about weighted data recently and realized that while I cover this in my survey data class I had not yet posted anything about it here.  Time to remedy that.

The good news is that it is not at all difficult to work with weighted survey data in Tableau.  And just what do I mean by weighted data? We use weighting to adjust the results of a study so that the results better reflect what is known about the population. For example, if the subscribers to your magazine are 60% female but the people that take your survey are only 45% female you should weigh the responses from females more heavily than males.

To do this each survey respondent should have a weighting amount associated with their respondent ID, as shown here.

Figure 1 – A snippet of survey data showing a separate column for Weight.

Figure 1 – A snippet of survey data showing a separate column for Weight.

When pivoting / reshaping the data make sure that [Weight] does not get reshaped.  It should remain in its own column like the other demographic data.

Once this is in place we’ll need to modify the formulas for the following questions types:

  • Yes / No / Maybe (single punch)
  • Check-all-that-apply (multi-punch)
  • Sentiment / Likert Scale (simple stacked bar)
  • Sentiment / Likert Scale (divergent stacked bar)

Yes / No / Maybe (single punch)

With this type of question you usually want to determine the percentage of the total.

02_YesNoMaybe

Figure 2 — Visualization of a single-punch question

Unweighted calculation

The table calculation to determine the percentage of total using unweighted data is

   SUM([Number of Records]) / TOTAL(SUM([Number of Records]))

Weighted calculation

The table calculation to determine the percentage of total using weighted data is

   SUM([Weight]) / TOTAL(SUM([Weight]))

Check-all-that-apply (multi punch)

With this type of question you usually want to determine the percentage of people that selected an item.  The total will almost always add up to more than 100% as you are allowing people to select multiple items.

Figure 3 -- Visualization of a multi-punch question

Figure 3 — Visualization of a multi-punch question

Most surveys will code the items that are checked with a “1” and those that are not checked with a “0”.

Unweighted calculation

The calculation to determine the percentage of people selecting an item using unweighted data is

   SUM([Value]) / SUM([Number of Records])

where [Value] is the name of the measure that contains the survey responses.  If the survey responses are coded as labels instead of numbers you can use this formula instead.

   SUM(IF [Label]="Yes" then 1 ELSE 0 END) / SUM([Number of Records])

Weighted calculation

The calculation to determine the percentage of people selecting an item using weighted data is

   SUM(IF [Value]=1 then [Weight] ELSE 0 END) / SUM([Weight])

Sentiment / Likert Scale (simple stacked bar)

This is very similar to the single-punch question but instead we have several questions and compare them using a stacked bar chart.  I am not a big fan of this approach but it can be useful when you superimpose some type of score (e.g., average Likert value, percent top 2 boxes, etc.).

Figure 4 -- Simple Likert Scale visualization

Figure 4 — Simple Likert Scale visualization

Figure 5 -- Simple Likert Scale visualization with Percent Top 2 Boxes combo chart

Figure 5 — Simple Likert Scale visualization with Percent Top 2 Boxes combo chart

Unweighted calculation – Stacked Bar

The table calculation to determine the percentage of total using unweighted data is

   SUM([Number of Records]) / TOTAL(SUM([Number of Records]))

Weighted calculation – Stacked Bar

The table calculation to determine the percentage of total using weighted data is

   SUM([Weight]) / TOTAL(SUM([Weight]))

Unweighted calculation – Percent Top 2 Boxes

Assuming a 1 through 5 Likert scale, the calculation to determine the percentage of people selecting either Very high degree or High Degree (top 2 boxes) using unweighted data is

   SUM(IF [Value]>=4 then 1 ELSE 0) / SUM([Number of Records])

Weighted calculation – Percent Top 2 Boxes

Assuming a 1 through 5 Likert scale, The calculation to determine the percentage of people selecting either Very high degree or High Degree (top 2 boxes) using weighted data is

   SUM(IF [Value]>=4 then [Weight] ELSE 0) / SUM([Weight])

Sentiment / Likert Scale (divergent stacked bar)

Here is what I believe is a preferable way to show how sentiment skews across different questions.

Figure 6 -- A divergent stacked bar chart

Figure 6 — A divergent stacked bar chart

I’ve covered how to build this type of chart using unweighted values here.

There are six fields we need to fashion the visualization, three of which need to be modified to make the visualization work with weighted data.

  • Count Negative
  • Gantt Percent
  • Gantt Start
  • Percentage
  • Total Count
  • Total Count Negative

Count Negative – Unweighted

Assuming a 1 – 5 Likert scale, the calculation to determine the number of negative sentiment responses using unweighted data is

   IF [Score]<3 THEN 1
   ELSEIF [Score]=3 THEN .5
   ELSE 0 END

Count Negative – Weighted

Assuming a 1 – 5 Likert scale, the calculation to determine the number of negative sentiment responses using weighted data is

   IF [Score]<3 THEN [Weight]
   ELSEIF [Score]=3 THEN .5 * [Weight]
   ELSE 0 END

Percentage – Unweighted

The calculation that determines both the size of the Gantt bar and the label for the bar using unweighted data is

   SUM([Number of Records])/[Total Count]

Percentage – Weighted

The calculation that determines both the size of the Gantt bar and the label for the bar using weighted data is

   SUM([Weight])/[Total Count]

Total Count – Unweighted

The calculation that determines the total number of responses for a particular question for unweighted data is

   TOTAL(SUM([Number of Records]))

Total Count – Weighted

The calculation that determines the total number of responses for a particular question for weighted data is

   TOTAL(SUM([Weight]))

Summary

Here’s a summary of all the unweighted calculations and their weighted equivalents

Unweighted Weighted
SUM([Number of Records]) / TOTAL(SUM([Number of Records])) SUM([Weight]) / TOTAL(SUM([Weight]))
SUM([Value]) / SUM([Number of Records]) SUM(IF [Value]=1 then [Weight] ELSE 0 END) / SUM([Weight])
SUM(IF [Value]>=4 then 1 ELSE 0) / SUM([Number of Records]) SUM(IF [Value]>=4 then [Weight] ELSE 0) / SUM([Weight])
IF [Score]<3 THEN 1 ELSEIF [Score]=3 THEN .5 ELSE 0 END IF [Score]<3 THEN [Weight] ELSEIF [Score]=3 THEN .5 * [Weight] ELSE 0 END
SUM([Number of Records])/[Total Count] SUM([Weight])/[Total Count]
TOTAL(SUM([Number of Records])) TOTAL(SUM([Weight]))

 

Nov 102015
 

Overview

Several weeks ago the data visualization community broke into justified outrage over an inexcusably misleading dual-axis chart from Americans United for Life.  I plan to write an article about this and other “ethically wrong” visualizations in a few weeks but in the meantime I encourage you to read these excellent posts from Alberto Cairo and Emily Schuch, as well as this discussion from Politifact.

Around the same time these posts appeared I came across a “Viz of the Day” dashboard from Emily Le Coz that accompanied a lengthy article in the Daytona Beach News-Journal.  The dashboard contained several visualizations but the one that caught my eye was this dual axis chart.

Figure 1 -- Infographic showing that as the number of firefighters has increased over the past 30 years, the number of fire-related deaths has decreased.

Figure 1 — Infographic showing that as the number of firefighters has increased over the past 30 years, the number of fire-related deaths has decreased.

I engaged in an interesting Twitter discussion about this graphic with Alberto Cairo, Jorge Camoes, and Noah Illinsky. I’ll get into that discussion in a bit (and point out some troubling problems with the visualization) but first want to discuss the use case for dual axis charts.

Why use dual axis charts

There are several reasons to use a dual axis chart (e.g., a Pareto chart that shows individual values along with the cumulative percent) but the primary use case is when you want to compare two completely different measures and see if there is any noteworthy relationship between the two measures.  Consider the example below that shows cyclical sales data for a retail store (bars) and the number of orders placed each month (line).

Figure 2 -- Dual axis chart comparing sales and orders by month.

Figure 2 — Dual axis chart comparing sales and orders by month.

The surprising result is that while November is historically the strongest month for sales ($5M from 2010 to 2013) the total number of orders placed in November is the lowest of any month. And yes, I checked to make sure that this was true of all years and not one crazy blowout year.

I think this dual axis combination chart (where we show bars and a line) makes it easy to see there is something very interesting about November. The low number of orders combined with the high sales – something that is easy to see – means that we either sold more items per order or more expensive items per order.

So, what’s wrong with the firefighter example?

Given that dual axis charts can be so useful I wondered why I had problems with the Firefighter example.  Fortunately, the author made the dashboard downloadable from Tableau public so I was able to see how it was put together.

Cutesy icons set the wrong tone for the piece

My first problem was with the firefighter hat and skull-and-crossbones icons.

Figure 3 -- Icons representing firefighters and civilian deaths.

Figure 3 — Icons representing firefighters and civilian deaths.

In my opinion (and it is just an opinion) I thought this “cartoonified” the visualization. I would much prefer to see either a simple color legend or a label next to both lines.

The author exaggerates the changes over time

A much more troubling issue is that the author uses a fixed Y-axis that exaggerates the changes over time.  The author also fails to show the axis labels so we can’t see that the axis doesn’t start at zero.

Consider the dashboard below that shows the original visualization on the left with an accurate visualization on the right.

Figure 4 -- Comparison of fixed axis vs. automatic axis charts.  Note that the axis uses a SUM() function while the label is using AVERAGE(). The data is repeated three times in the data source which is why the author needs to use AVERAGE(). Yes, the axis should use AVERAGE() as well but the relative positioning of the elements is the same with SUM() so this causes no harm.

Figure 4 — Comparison of fixed axis vs. automatic axis charts.  Note that the axis uses a SUM() function while the label is using AVERAGE(). The data is repeated three times in the data source which is why the author needs to use AVERAGE(). Yes, the axis should use AVERAGE() as well but the relative positioning of the elements is the same with SUM() so this causes no harm.

Because the author fixed the Y-axis rather than starting from zero, the slope of the lines is exaggerated. While this does not alter what is in fact a noteworthy observation, whenever I see this type of “rigging” it makes me question the validity of any and all parts of the story.  That is, even though I don’t think the exaggeration was an intentional attempt to dramatize the difference, seeing this in play will make me question everything that the author and the publication now publishes.

Am I being too hard on the author? I don’t think so as anything that’s published as a “viz of the day” and accompanies a high-profile news article should get a lot more scrutiny than just any old Tableau Public visualization.  While I don’t feel mislead by the overstated changes, I do wonder at what point does a viz cross the line into TURD territory (Truly Unfortunate Representation of Data)? We’ll save that discussion for a later post.

Different approaches

Combination area and line chart

After adjusting the axis I still wondered if having two line charts was causing unnecessary confusion. In my first makeover attempt I tried combining an area graph with a line chart, as shown here.

Figure 5 -- First makeover attempt.  A dual axis chart using an area chart for firefighters and a line chart for civilian deaths.

Figure 5 — First makeover attempt.  A dual axis chart using an area chart for firefighters and a line chart for civilian deaths.

While using two different chart types made it easier to see that I was comparing two different measures, I didn’t love the chart and sought alternatives.

Connected Scatterplots

On Twitter Jorge Camoes offered this connected scatterplot.

Figure 6 -- Jorge Camoes’ connected scatterplot.  Notice that the axes do not start at zero but that the axes labels are at least visible.

Figure 6 — Jorge Camoes’ connected scatterplot.  Notice that the axes do not start at zero but that the axes labels are at least visible.

In a connected scatterplot the path the line takes represents the year.  This is why the line folds back on itself from time to time (more on this in a moment).  Camoes also “normalized” the data using an index so that both civilian deaths and number of firefighters start at a value of 100.

I like this visualization very much but fear that many people won’t understand the index value of 100 so I tried my own connected scatterplot, shown below.

Figure 7 -- Connected scatterplot with regular vs. normalized values.  Notice that the X-axis does not start at zero but that the axes labels are visible.

Figure 7 — Connected scatterplot with regular vs. normalized values.  Notice that the X-axis does not start at zero but that the axes labels are visible.

Before anyone cries foul about the X-axis, here’s a version with the axis starting at zero.

Figure 8 -- Connected scatterplot with both axes starting at zero.  This may be why Camoes normalized the data although his chart doesn’t start at zero, either.

Figure 8 — Connected scatterplot with both axes starting at zero.  This may be why Camoes normalized the data although his chart doesn’t start at zero, either.

I think starting the x-axis at zero obscures the relationship but that’s not what makes me question using this approach.  My problem is that many people will have a hard time understanding how the line “works”, as it were.  This is because whenever we see a line chart that involves time we come to expect marks on the left of the chart to show older dates and marks on the right to show newer dates.  In other words, we expect the chart to behave like this.

Figure 9 – Since grade school we’ve been indoctrinated to expect earlier dates to the left and later dates to the right.

Figure 9 – Since grade school we’ve been indoctrinated to expect earlier dates to the left and later dates to the right.

With a connected scatterplot the X-axis is “owned” by an independent measure so we have to adjust our perception to see that sometimes a later year will appear to the left of an earlier year, as shown below.

Figure 10 -- Connected scatterplot with marks showing all years.

Figure 10 — Connected scatterplot with marks showing all years.

Notice how 1986 appears to the left of 1985 and 1989 appears to the left of 1988.  Unless you are used to this type of approach this can look very strange.

Keep it simple

After experimenting a bit more I decided to forgo the dual axis and connected scatterplots and fashioned this simpler narrative.

Figure 11 -- Two separate charts yielding a simple and easy-to-follow narrative.

Figure 11 — Two separate charts yielding a simple and easy-to-follow narrative.

If you have what you think is a better approach I would love to see it.  If you’re using Tableau you can download the packaged workbook with the original dashboard and various makeover attempts here.

Oct 132015
 

Overview

You can download a trial version of Alteryx here.

You can find the Excel source data here.

You can download the completed Alteryx module from here.

Understanding the workflow

Here’s the Alteryx workflow that will get the data in the format we want.

Figure 9 -- Alteryx workflow for getting the survey data in an optimal format for analysis in Tableau

Figure 9 — Alteryx workflow for getting the survey data in an optimal format for analysis in Tableau

If this is the first time you’ve seen an Alteryx workflow it may look a little complicated (shades of Rube Goldberg) but think about how complicated the flow and decision tree is for what you do when you get up in the morning and get ready for work – it’s way more complicated.

Let’s start by focusing on the three data sources that we need (labeled 1, 2, and 3 in the diagram above).

  1. Data with numeric responses (we’ll use this for our demographic data, too)
  2. Data with text responses
  3. The Question Helper file (contains our meta data).

A note about SPSS files

For this example we’re using Excel data but the process would be identical for survey data saved in an SPSS format (.SAV file).  SPSS encodes both the text and numeric data in a single file, but you need to input values from that same data source twice – once for text and once for numeric values.

When you specify an SPSS file as a data source you’ll see the following check box option.

Figure 10 -- Options that appear when you bring in data from an SPSS file.

Figure 10 — Options that appear when you bring in data from an SPSS file.

Selecting “Expand Factors” will bring in the data as labels.  Unchecking this box will bring in the data as numbers.

Demographics

Figure 11 -- Selecting the demographic components from the data source

Figure 11 — Selecting the demographic components from the data source

Clicking the Select tool reveals the following settings.

Figure 12 -- Specifying which fields we want to keep

Figure 12 — Specifying which fields we want to keep

A translation into English would be “open this Excel file and select these fields”.  Here we keep RespID, Gender, Location, Generation, and Weight.

Numeric values

Figure 13 -- Numeric values section of the Alteryx workflow with the Reshape tool selected

Figure 13 — Numeric values section of the Alteryx workflow with the Transpose tool selected

Using the same data source, selecting the Transpose tool reveals the following settings.

Figure 14 – Transpose tool settings

Figure 14 – Transpose tool settings

Here we indicate that we want to keep RespID and pivot all the other checked fields.  Notice that the demographic components are not checked.

Looking back to the workflow diagram, we next follow this pivot with a Select tool which we use to rename the pivoted fields as shown here.

Figure 15 -- The Select tool allows us to name the pivoted fields

Figure 15 — The Select tool allows us to name the pivoted fields

Here we are naming the first pivoted field Question ID and the second pivoted field Numeric value.

Dealing with nulls

Figure 16 -- The Filter tool allows us to eliminate null values before they hit Tableau.

Figure 16 — The Filter tool allows us to eliminate null values before they hit Tableau.

While not critical, Alteryx allows us to eliminate null values before they come into Tableau.  Here’s the setting for this tool.

Figure 17 -- Setting to eliminate nulls from the Numeric value field

Figure 17 — Setting to eliminate nulls from the Numeric value field

Joining the demographic with the transposed (pivoted) numeric data

Figure 18 -- Joining the demographic and transposed numeric data

Figure 18 — Joining the demographic and transposed numeric data

We’re now ready to join the demographic data with the pivoted numeric survey data. We do this with Alteryx’s Join tool.  Here are the settings.

Figure 19 -- Joining demographics and numeric data using the common field, RespID. Notice that we don't include the redundant version of RespID that is coming from the right table.

Figure 19 — Joining demographics and numeric data using the common field, RespID. Notice that we don’t include the redundant version of RespID that is coming from the right table.

Checking our progress

If we add a Browse tool after the Join we’ll see the following results.

Figure 20 -- Results of the initial Join

Figure 20 — Results of the initial Join

Note that we could, in fact, get by with using just this in Tableau, but the whole point is to make things faster and easier, so we’re going to add text responses and the “Helper” data.

Text Values

Figure 21 -- Workflow for setting the text values

Figure 21 — Workflow for setting the text values

This is identical to the numeric values except that we specify a different data source (one with label responses instead of numeric) and we rename the second pivoted field differently, as shown here.

Figure 22 -- The Select tool allows us to name the pivoted fields

Figure 22 — The Select tool allows us to name the pivoted fields

Here we are naming the first pivoted field Question ID and the second Text value.

Tying Demographic, Numeric and Text Values Together

We need to combine the demographic / numeric values with the text values.  We will again use Alteryx’s Join tool, but we’ll need to join on both RespID and Question ID.

Figure 23 -- Joining demographic, numeric, and text data together

Figure 23 — Joining demographic, numeric, and text data together

The setting for the Join is shown below.

Figure 24 -- Join settings for combining demographics / numeric with text data

Figure 24 — Join settings for combining demographics / numeric with text data

Adding the question meta data and outputting the results

We’re almost done; we just need to add the information that tells us about each Question ID so we can map the ID to its human-readable form, group related IDs together, and so on.

Figure 25 -- Joining the "Helper" file data

Figure 25 — Joining the “Helper” file data

The setting for this Join is shown below.

Figure 26 -- Marrying the question meta data to the demographic and survey data

Figure 26 — Marrying the question meta data to the demographic and survey data

The only thing left to do is output this to a new data source.  In the sample workflow I output to an Excel file but you can just as easily write directly to a Tableau Data Extract file.

But Wait!  There’s More!

I intentionally used data that was well behaved, but what happens if the data is “messy”?  For example, suppose there’s an extra row below the field names?  Or suppose that for a check-all-that-apply question, instead of 1s and 0s we instead have 1s and blanks (which would mess things up in Tableau)?  Alteryx handles both of these scenarios (and dozens more) easily.

Is there a Downside?

If there’s any downside it would be the cost as Alteryx starts at around $4,000 per year.  If all you are doing is some occasional survey data reshaping then this fee would probably be prohibitively expensive. If, however, you need to corral a lot of data – or if you need advanced data blending, geospatial analysis, or predictive modeling – then Alteryx is a huge bargain.

 Posted by on October 13, 2015 2) Visualizing Survey Data, Blog Tagged with: , , , , ,  10 Responses »
Oct 132015
 

Overview

In writing about visualizing survey data using Tableau I’ve found that the number one impediment to success is getting the data in the right format. In accompanying posts I’ll explain how to get this done using Alteryx, Tableau 10.x, the Tableau Excel add-in, and Tableau 9.0 pivot feature (you can come close with 9.x, but can’t get it perfect).

What do I mean by “just so”?

When I deal with survey data there are usually four different elements that need to fit together:

  1. The demographic information (e.g., age of respondents, gender, etc.)
  2. Survey responses in text format
  3. Survey responses in numeric format
  4. Meta data that describes the survey data.

Let’s see what the four elements look like using an Excel sample data set (click here to download).

Demographic data

Here’s what the demographic data looks like.

Figure 1 -- Demographic data

Figure 1 — Demographic data

Survey responses in text format

Here are several columns of survey responses in text format.  Column F contains data for a Yes / No / Don’t know question.  Column G contains responses for a question about salary.  Columns H through P are responses for check-all-that apply questions and columns Q and R contain Likert scale responses.

Figure 2 -- Survey responses in text format

Figure 2 — Survey responses in text format

Survey responses in numeric format

Here are the same responses but in numeric format.

Figure 3 -- Survey responses in numeric format

Figure 3 — Survey responses in numeric format

I’ll explain why it’s so useful to have the survey responses in both text and numeric format in a bit.

Meta Data (the “helper” file)

Here’s some data that I usually prepare by hand as most survey tools won’t produce it for me automatically.  Having this helps me understand the data and will  greatly streamline my work in Tableau.

Figure 4 -- Survey data meta data. This doesn’t take long to create and will be a huge time saver once we get the data into Tableau.

Figure 4 — Survey data meta data. This doesn’t take long to create and will be a huge time saver once we get the data into Tableau.

What does “just so” look like?

Our goal is to combine and reshape the various elements so that they look like this.

Figure 5 -- Reshaped data joined with meta data. Survey data in this format is very easy to use with Tableau.

Figure 5 — Reshaped data joined with meta data. Survey data in this format is very easy to use with Tableau.

As I’ve written previously, the key thing is that I no longer have a separate column for each survey response.  Indeed, I’ve reduced the number of columns from 45 to just 11, but I’ve also increased the number of rows from 845 to over 25,000. That is a good thing.

Why this works so well with Tableau

Our goal is to see how to get Alteryx to get the data in this format, not to actually use the data, but if you need convincing on why the meta data is so helpful, consider the following example.

Let’s say that in your survey you ask people to indicate the importance and satisfaction about certain services, as shown here.

Figure 6 -- Question comparing importance with satisfaction

Figure 6 — Question comparing importance with satisfaction

With the data set up “just so” conducting this comparison in Tableau becomes easy.  First we can drag Question Grouping into Filters and indicate that we just want to look at Importance and Satisfaction questions.

Figure 7 -- Using the Question Grouping field to just focus on Importance and Satisfaction questions

Figure 7 — Using the Question Grouping field to just focus on Importance and Satisfaction questions

Then we can drag Wording and Question Grouping onto the Rows shelf which gives us the framework for comparing importance and satisfaction across ten different questions.  No more having to “look up” which questions we want to explore and no more having to alias question IDs.  I love this!

Figure 8 – The helper file meta data provides the framework for comparing questions and building visualizations.

Figure 8 – The helper file meta data provides the framework for comparing questions and building visualizations.

Why do we need both text and numeric results?

We don’t really need them, but I know I certainly want them.

Consider all of the Likert scale question results.  The universe of possible values are

1
2
3
4
5

Suppose we want to know just what each of the values (1, 2, 3, 4 and 5) stand for?  The problem is that it depends on the question being asked as sometimes a 5 means “Strongly agree”, for other questions it  means “Critical” and for others it means “Extremely satisfied”.

Without having both numeric and text results we will have to write A LOT of IF / CASE statements and I, for one, do not want to do that.

So, now that we understand how and why we want the data “just so” we’ll see how to get it that way using Alteryx, Tableau 10.x, the Tableau-add-in for Excel, and Tableau 9.x.

 Posted by on October 13, 2015 2) Visualizing Survey Data, Blog Tagged with: , , , ,  6 Responses »
Sep 232015
 

Overview

I recently wrote about emotional vs. accurate comparisons and several people questioned whether the word “emotional” was appropriate.  (Several people questioned my assertions, too.  You can read their comments here.)

For this discussion I’ll use the term “engagement” in place of “emotion” and we’ll look into the challenges of creating public-facing visualizations that attract and engage, are clear and accurate, and do these things without “dumbing down” the subject matter.

Time Magazine and a cumbersome infographic

Stephen Few recently wrote a great post about the following infographic that appeared in Time Magazine in August, 2015.

Figure 1 -- Time Magazine's "Why we still need women's equality day" infographic. See http://time.com/4010645/womens-equality-day/.

Figure 1 — Time Magazine’s “Why we still need women’s equality day” infographic. See http://time.com/4010645/womens-equality-day/.

I have three major problems with this treatment.

  1. This is an important subject but the cutesy approach trivializes it.
  2. With so many chart types I have to work very hard to make comparisons among the different areas (Federal, Congressional, etc.). In addition, the chart is very long and requires a lot of scrolling.
  3. I strongly suspect that most people thought this was a dashboard having to do with Republicans and Democrats. I know that for me, whenever I see red and blue in a political context I think Republicans and Democrats and I had to fight this expectation to see that this was about men and women.

Stephen Few’s redesign

Here is Few’s redesign.

Figure 2 – Stephen Few’s clear and compact redesign.

Figure 2 – Stephen Few’s clear and compact redesign.

The collection of stacked bars makes it very simple to compare across the various categories and treats an important subject with the seriousness that is warranted.

But…

Few’s treatment is rather clinical and may be a little too dry for Time Magazine.

So, is there a way to fashion a graphic that is clear and accurate, like Few’s, but does more to draw the reader in?

Alberto Cairo’s redesign

Stephen Few asked Alberto Cairo to have a look at the source graphic and Cairo was able to turn out the following in a matter of minutes.

Figure 3 -- Cairo's redesign of Few's redesign.

Figure 3 — Cairo’s redesign of Few’s redesign.

Here are Stephen Few’s comments upon seeing the redesign:

“Alberto,

You’re the man! I love your improvements to the graphic.

You described your version as middle ground between my position and that of the embellishers, but I don’t see it that way. I’m an advocate of the kinds of embellishments that you added to the graphic for journalistic purposes, for they don’t detract from the information in any way. I’ve always said that journalistic infographics can be both informative and beautiful without compromising either. Doing this takes skill, however, that relatively few of the folks producing infographics possess. It also takes graphic design skill that I don’t possess, which is why I don’t design journalistic infographics. You’ve illustrated what it takes to do this well. As I said, you’re the man.”

I think Cairo would be the first to agree that there are many shortcomings to his rendering (e.g., colors, the guy on right looks like he’s holding a boomerang and not reading a book, etc.) but remember, Cairo put this together in a few minutes simply to show that it is in fact possible to create something that is beautiful and emotionally engaging without sacrificing one pixel of analytic integrity.

 

Sep 212015
 

Overview

I’ve conducted a lot of Tableau training classes and have found three things that confuse students simply because of the nomenclature Tableau uses for these things.  These three terms are

  • Headers
  • Table Calculations
  • Quick Filters

Headers

Consider the chart below that has both mark labels and an axis along the bottom.

Figure 1 -- Bar chart with visible axis.

Figure 1 — Bar chart with visible axis.

Because each bar has a label we don’t need to see the axis.  We can hide the axis by right-clicking it and selecting…

Figure 2 -- Turning off the header turns off the... footer.

Figure 2 — Turning off the header turns off the… footer.

… Show Header.

Yes, indicating that we don’t want to display a header will make Tableau hide…

the footer!

As I explain to students, in Tableau anything that surrounds a chart is called a Header.  If it’s along the top of a chart, it’s a Header.  Left side of the chart?  Header?  Bottom?  Header.  Right side?

Header.

Table Calculations

I know the first time I saw this I thought “Table Calculations” pertained to a visualization that used text tables. As I explain to students, I think of table calculations as Tableau having the ability to do math in its head.

Consider the example below where we show the raw vote count for each candidate from the 2012 US presidential election.

Bar chart based on query to the back-end database

Figure 3 — Bar chart based on query to the source database

Here, Tableau has queried the underlying database and is displaying the results based on that query.

With a table calculation, Tableau looks at the results that are already on display, as it were, and then does some additional internal calculations.  In the case of asking Tableau to show the percent of total, Tableau adds up the total for all three candidates and then divides the tally for each candidate by that total.

As I said, I find it helpful to think of Tableau Calculations as Tableau doing math in its head.

Quick Filters

To filter results in Tableau, you drag dimensions and measures from the Data window to the Filters card and then apply the settings you want for the various filters.

If you want easier access to the filter settings you can right-click a filter and select Show Quick Filter.

The problem with this term is that people new to Tableau think this pertains to speeding up the filter when it in fact means that you just want the filter control to be visible on a worksheet or a dashboard.  It has nothing to do with making filters quick.  In fact, having lots of quick filters on a worksheet can slow Tableau down because Tableau has to calculate what selections should appear in each of the quick filters.

The only rationale I can see for the name is that it allows you to access the settings quickly rather than having to go through the Filters dialog box.  Still, it’s quite confusing for those first learning Tableau.

Summary of confusing terms

Here’s a summary of the terms that often confuse people new to Tableau.

Term What students think it means What it actually means
Header Something at the top of a chart Anything that surrounds a chart
Table Calculation Something having to do with text tables / cross tabs The ability for Tableau to do math “in its head”
Quick Filters Some setting that makes filters work faster Make the filter control visible

What should we call these things and should Tableau rename them?

Given just how entrenched Tableau is it may be too late to change these terms, but if it’s not too late…

In the case of Show Quick Filters I would change it to Show Filter Control.

What about Table Calculations and Headers?  Got any ideas?

 

Sep 152015
 

Overview

Figure 1 – Bar charts are better than pie charts are better than donut charts.  Most of the time.

Figure 1 – Bar charts are better than pie charts are better than donut charts.  Most of the time.

As anyone who has read this blog knows I’m definitely a “bar charts are better than pie charts are better than donut charts” kind of guy, at least when you need to make an accurate comparison.

But in my classes, as I rearticulate the case against pies and donuts, I find myself wondering if there are in fact times when a pie chart might be a better choice.

Most of my data visualization work is for internal purposes so I focus on making it easy for people to make an accurate comparison.

But as my clients and I make occasional forays into public-facing visualizations I think about how to make it easy for people to make an emotional comparison.  By this I mean that I want people viewing the visualization to just “get it”.

Better yet, I want people to get it, be engaged by it, and in some cases, “feel” it.

With this in mind, in this post we’ll explore cases where

  • a pie chart is in fact as good, if not better, than a bar chart.
  • circles and spheres do a better job conveying magnitude than do bars.
  • a waffle chart produces an emotional wallop without compromising analytic integrity.

Where a pie chart trumps a bar chart

So, it’s the year 2034 and in this somewhat dystopian future there’s a movement afoot to add an amendment to the US constitution banning the use of pie charts.

Those of you familiar with the United States Constitution know that three-quarters of the states need to approve an amendment for said amendment to become law.  In 2034 it turns out the 39 of 50 states will in fact ratify the amendment.

Does that get us the needed 75%?  Here’s a simple, compact chart that lets us know immediately.

Figure 2 -- The amendment banning pie charts passes as I can see that the "Yes" votes fill more than three quarters of the circle.

Figure 2 — The amendment banning pie charts passes as I can see that the “Yes” votes fill more than three quarters of the circle.

It’s so easy to see that the “Yes” votes fill more than three-quarters of the pie that I don’t need labels indicating the large slice is 78% and small slice is 22%.

Compare this with a bar chart.

Figure 3 -- Did the "Yes" exceed 75%?  Without labels it's very hard to tell.

Figure 3 — Did the “Yes” exceed 75%?  Without labels it’s very hard to tell.

Without labels showing the percentages I cannot tell for sure if the “Yes” bar is more than three times larger than the “No” bar.

Okay, Okay, Okay!  I know that a simplified bullet chart would work, too.

Figure 4 -- A bullet chart shows that we've exceeded the goal.

Figure 4 — A bullet chart shows that we’ve exceeded the goal.

Yes, the bullet chart makes it clear that I’ve exceeded my goal but I need to know that the goal was 75%.  I don’t need the goal line with the pie chart.

So, does this mean that it’s okay to use pie charts instead of bar charts?

No.  Based on this example it’s only okay to use a pie chart (singular).  In addition, your pie chart (singular) needs to meet the following conditions:

  • One of the slices has to make up at least 50% of the pie.
  • If you’re pie has more than two slices you don’t ask people to compare the smaller slices.

Where circles and sphere’s do better than bars

As we all know Jupiter is big, really big.

Just how much bigger is it than Earth?

Should I create a bar chart to show this? If I were to create one should I compare the radius or the surface area of each planet?

Or should I really go nuts and compare the volume of the planets?

I don’t think the dashboard shown above is nearly as effective as the visualization shown below.

Figure 5  -- "Size planets comparison" by Lsmpascal - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:Size_planets_comparison.jpg#/media/File:Size_planets_comparison.jpg

Figure 5  — “Size planets comparison” by Lsmpascal – Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:Size_planets_comparison.jpg#/media/File:Size_planets_comparison.jpg

Jupiter and Saturn – and even Neptune and Uranus – really dwarf earth and the other planets and with this visualization I feel it.

Even the simple chart comparing the area of the cross section of the planets gives me a better feel for the data than does the bar chart.

Figure 6 -- Circles comparing cross-section area of the planets.  Yup, I can tell that Jupiter is way bigger than Earth.

Figure 6 — Circles comparing cross-section area of the planets.  Yup, I can tell that Jupiter is way bigger than Earth.

Is it essential that I can tell exactly how much larger one planet is than another?  I don’t think it is and I much prefer the emotional pull of the circles and the spheres.

A Fun Tangent

One thing that’s very hard to express in a static chart is how much space there is between the sun and the planets.  To get a sense of just how incredibly vast the distances are check out this fascinating, albeit somewhat tedious, interactive visualization from Josh Worth.

Getting an emotional wallop with waffles

A few weeks ago Cole Nussbaumer posted a tweet asking people what they thought of this chart from The Economist:

Figure 7 – A waffle chart from the article "Teens in Syria".  See http://www.economist.com/blogs/graphicdetail/2015/08/daily-chart-6?fsrc=rss.

Figure 7 – A waffle chart from the article “Teens in Syria”.  See http://www.economist.com/blogs/graphicdetail/2015/08/daily-chart-6?fsrc=rss.

The first thing that surprises me about this is that The Economist went with a waffle chart and not a bar chart, like the one below.

Figure 8 -- The type of chart I would have expected to see in The Economist.

Figure 8 — The type of chart I would have expected to see in The Economist.

The second thing that surprised me was that I preferred the waffle chart.  Yes, as Jeffrey Shaffer correctly points out, the dots are so tightly packed that you literally see stars between the circles, but  this can easily be remedied.  The question on my mind is why do I prefer waffles?

My answer is that the having each dot represent one of the 120 people surveyed connected with me in a way that the bar chart did not. Combined with the percentage labels (which are critical to the success of the visualization) the waffle chart hit me hard and it did so without dumbing down the importance of the discussion one bit.

So, are bars charts always boring?

No!  In my next blog post I’ll show you an example of a bar chart embedded inside a “come hither” graphic that

  • attracts and engages
  • does not trivialize an important issue
  • represents the data clearly and accurately

Stay tuned.

Sep 012015
 

Overview

I’ll admit that I have a problem with treemaps in Tableau, but it’s not because the chart type is in some way inferior. My problem is with how people use – and misuse – treemaps.

Here’s a good example of misuse.  Instead of displaying something straightforward that looks like this…

Figure 1 -- The humble, but accurate bar chart

Figure 1 — The humble, but accurate bar chart

… some people feel compelled to add “visual variety” to their dashboards and instead create something that looks like this.

Figure 2 -- Look , Ma! I made a Mondrian!

Figure 2 — Look , Ma! I made a Mondrian!

Except for the “it looks cool” factor there’s no good reason to use a treemap in this situation.

So, when should you use a treemap?

What’s in a treemap and why it can be useful

With a treemap you have two attributes at your disposal:

  1. The size (area) of rectangles, and
  2. The color of the rectangles

A treemap consists of packed rectangles where the area of a rectangle corresponds to the size of a particular measure.  In the example above the size of the rectangle is based on the number of people that come from a particular region.  North America has the largest value so it’s represented with the largest rectangle. Europe has a smaller value to its rectangle is proportionally smaller.

Treemaps really come in handy is when you have A LOT of marks to plot and you need to show all of the marks in a compact area.

So, this sounds like a great chart – we’ve got rectangles to show how big and small stuff is, color to group related rectangles intelligently, and we can fit a lot of stuff in small space.  Why not use this chart all the time?

The downside is that we are comparing the area of rectangles and with rectangles it is difficult to make an accurate comparison. People may be very good at comparing the length of bars but as a species we are not particularly good at comparing the area of rectangles (and we’re downright awful at comparing the area of circles.)

So, given the advantages and shortcomings, just when should you use it?  Let’s look at a particular scenario.

Showing Presidential Electoral Results

A Filled Map

Consider the electoral map below showing electoral votes by state for Barack Obama and Mitt Romney in 2012.

Electoral Map Filled

Figure 3 — Filled map showing electoral votes for the 2012 presidential election (displaying 48 out of 50 states)

Our Electoral College system is fairly confusing and I can only imagine how somebody from outside the US would look at this as there appears to be more red on the map than blue… but the blue guy won!

This discrepancy becomes even more pronounced when we include Alaska and Hawaii in the map.

Figure 4 -- Filled map showing electoral vote winners for the 2012 presidential election (displaying 50 states)

Figure 4 — Filled map showing electoral vote winners for the 2012 presidential election (displaying 50 states)

Clearly, a map designed to show how much area there is in a state fails with Electoral College results where the numbers are based on population not land mass.  In the example above there’s A LOT more red then blue, but again, the blue guy won the election.

Perhaps a different type of chart will do a better job?

Symbol Map

Here’s a symbol map of the same data.

Figure 5 -- Symbol map showing electoral vote winners for the 2012 presidential election (displaying 48 out of 50 states)

Figure 5 — Symbol map showing electoral vote winners for the 2012 presidential election (displaying 48 out of 50 states)

I think this is more accurate as there’s clearly more blue than red, but it’s still a tough read.  What else might work?

Cartogram

Here’s a cartogram from Professor Mark Newman of the University of Michigan showing the same data, except the polygons for each state has been adjusted to reflect the population of the state.

Figure 6 -- Cartogram showing election results where the shape of the state is based on its population and not land mass.

Figure 6 — Cartogram showing election results where the shape of the state is based on its population and not land mass.

While it’s very clear that there is more blue than red on this map there are two problems with this approach:

  1. There aren’t many tools that will support this type of distortion; and,
  2. This map will frighten small children.

Summary Bar Chart

Why not just display a simple bar chart showing the total number of electoral votes, like the one shown here?

Figure 7 -- Electoral vote count by candidate

Figure 7 — Electoral vote count by candidate

This is certainly very clear and we can see easily by how much Obama won, but we’re missing an important part of the story.

In US presidential elections a winner is chosen by tallying the electoral votes from each state and the summary bar chart doesn’t show us how each state contributes to the total for each candidate.

And the Winner is… ? The Treemap!

Here’s a treemap showing the exact same data.

Figure 8 -- Treemap showing 2012 electoral vote results

Figure 8 — Treemap showing 2012 electoral vote results

Of all the single visualizations I think this treemap tells the most complete story.  We can see just how much states like California, Texas, Florida, and New York contribute to the total as well as gauge —  to some degree  — just how many more electoral votes Obama received than did Romney.

One shortcoming, however, is that we can’t see the names of all the states as some of the rectangles are too small.

One way to address this is by adding a tool tip, as shown here.

Figure 9 -- Hovering over a mark allows me to see the name of the state and number of electoral votes.

Figure 9 — Hovering over a mark allows me to see the name of the state and number of electoral votes.

While this works, a problem we should address is that the small states are not easily searchable.  That is, if I want to know the results for Alaska, Hawaii, Delaware, etc., I have to go hunting for them.

At this point we’ve gotten about as far as we can get with a single chart.  To tell the complete story – and to make it easy for people to find results for a particular state – we should create a dashboard.

The Electoral Vote Dashboard

Here’s a dashboard that puts two of the views together and that allows the user to find a particular state’s rectangle by selecting the state from a list.

Figure 10 -- Electoral votes dashboard.  Selecting a state from the list will display that state’s rectangle in the treemap.

Figure 10 — Electoral votes dashboard.  Selecting a state from the list will display that state’s rectangle in the treemap.

While the “star” of the dashboard is the treemap, the summary bar chart and the selectable list make the story complete and we get a solid understanding of the 2012 Electoral College results.

And we achieved this without using an actual map.

Click here to interact with dashboard.

Aug 112015
 

Overview

So, here’s why until recently I’ve recommended that my clients avoid large dashboards.

We’ve been working on a collection of killer dashboards and we’re all set to make a big presentation to the CEO. This thing is so high profile we get to use the executive conference room with the super bright projector and the 120-inch screen.

Our dashboards are all 1,325 x 1,000 pixels, but they’re going to look fantastic on that giant screen.

We’re incredibly well prepared.

At least we think we’re incredibly well prepared because when we arrive an hour early we discover the top resolution of that ever-so-fancy projector is 1,280 x 800 and our ever-so-well-crafted dashboards won’t fit on the screen.

Tableau Desktop and Reader will not scale the dashboard intelligently.

It doesn’t fit! Tableau Desktop and Reader will not scale the dashboard intelligently and we end up with the dreaded scroll bars.

Yikes, we have scroll bars! What are we going to do?

And don’t suggest using Tableau’s “Automatic” dashboard setting as it will just squish the different visualizations and won’t scale the fonts.

Let your browser scale the dashboard

While Tableau Desktop and Reader cannot scale your dashboard, Tableau Public, Tableau Online, and Tableau Server — with the help of your browser — can scale the dashboard, and scale it intelligently.

For example, using Tableau Public with the  “Zoom” feature in Google Chrome…

Using Google Chrome's "Zoom" Setting

Using Google Chrome’s “Zoom” Setting

… allows us to “fit” the dashboard on our large, but relatively low-resolution, screen.

It fits!  Thank you, browser.

It fits! Thank you, browser.

Conclusion

If you are presenting your work using Tableau Desktop or Tableau Reader then you either have to compose for the lowest-common-denominator screen or live with scroll bars.

If, however, Tableau Public, Tableau Online, or Tableau Server are an option, you should be able to use your browser’s zoom feature to make sure your dashboards fit on the screen.