Jan 112012
 

Note: This post is very dated. Please see the YouTube video from the Tableau 2014 conference and the Visualing Survey Data page for more up-to-date information.

Overview

In my previous blog post I discussed how to reshape your survey data so that it plays nicely with Tableau.  In this post we’ll explore how to visualize three types of survey questions:

  1. Yes / No / Maybe
  2. Check-All-That-Apply
  3. Likert Scale

Note: A fully working collection of interactive visualization may be found at the end of this blog post.

Connecting to the Data

We need to connect to the reshaped data source we created in Part 1.  For those of you that are following along and downloaded the sample data, there are several tabs in the source Excel file.  Make sure you connect to the one marked “Use This One” as shown below.

Connecting to the right data source

When asked how you want to connect to the data, select “Import all data”.

You’ll be happier if you import all the data and let Tableau create an extract

There are two reasons for doing this:

  1. With larger data sets your performance will be much faster; and,
  2. You will need Tableau’s COUNTD() function and that is not available when you connect directly to Excel (or Access, for that matter.)

Your dimensions and measures shelves should look like the one shown here.

Default Dimensions and Measures

Note that the ID “Measure” should in fact be a Dimension, so when you get a chance, drag the field up to the Dimensions shelf.

So, just who are these people?

Before visualizing any questions I like to get a sense of who participated in the survey.  I’ll start by looking at Gender.

Start by getting a feel for who answered the survey questions

Notice that I place COUNTD(ID) on the columns shelf indicating that I want to count the number of unique.  Had I used the regular COUNT() function I would count all of the rows that contain a respondent’s ID, and that is a large number given how we’ve reshaped the data.

The “Null” value indicates that some people chose not to respond to this demographic question.  I suggest creating an alias called “Did not respond” to handle these non-responses.

From here I will create views for any other demographic groups (e.g., Generation, Location, etc.).

Visualizing a Yes / No / Maybe question

Let’s start by seeing what questions are available by placing Question on the Rows shelf.

With reshaped data there’s only one field, but many questions

As you may recall from the previous blog post, our sample survey contained the question “Do you plan to vote in the upcoming election?” for which there were three possible responses (“Yes”, “No”, and “I don’t know.”)

As I’m not interested in all the questions in the survey I need to add a filter that just keeps the first question in the list.

To see preliminary results for this question I will place “Response” on the rows shelf and COUNTD(ID) on the columns shelf.  While not particularly pretty, it does show us something useful.

Preliminary results

I will next alias the Question (and Responses), sort the Responses, and place Response on the color shelf.  I will also add a table calculation to the Columns shelf so that instead of showing a count we show the percentage of the total.

A simple, clear visualization

Now we’re talking!

The nice thing about the way we have this set up is that it’s easy to break down the results by our demographic segments simply by dragging a dimension to either the rows or column shelf, as shown below.

Results broken down by generation

Check-All-That-Apply Questions

To get a sense of what we have going on for this question I duplicated the sheet from the previous visualization and then changed the Question filter so that I’m looking at the check-all-that-apply questions.  Here are the filter settings.

Selecting questions using a filter

Do you now see the major advantage of having reshaped the data?  Had I kept the one column / one question approach I would have to use Tableau’s Measure Name / Measure Names feature and I would likely have to build a separate calculated field for every question I want to analyze.  With reshaped data I just put Question on the Rows shelf and filter by the questions I want to analyze.

Tip: I will usually create a set for each question grouping I want to review, sort of a “named filter”, as it were. In the accompanying workbook you will find two sets, one called “Check All That Apply” and the other called “LikertQuestions”.  You can then drag either of these sets to the filter shelf.

Here’s the resulting visualization.

A bad way to show check-all-that-apply results

Well, it’s a start, but it’s a not a a very good visualization.  For one thing, I only care about the percentage of people that selected “yes”; I don’t even need to consider the “no” folks.

I need to fashion a formula that counts of how many “yes” responses there were for each question, then divide by the number of responses.  This would be simple of the responses for these questions were 1 and 0, but unfortunately the responses are labels that look like this:

1: Yes
0: No

So, I need to perform a little string arithmetic.  The calculated field that does what we want is shown here.

SUM(
IIF (Left([Response],1)="1" ,1,0)
)
/COUNTD([ID])

And a visualization that employs this formula is shown here.

A better way to show check-all-that-apply results

That’s much better.  And of course, I can add a demographic dimension to glean additional insights.  In the screen below I show a comparison between male and female respondents, using an overlapping bar chart.

Check-all-that-apply question, broken down by gender

Likert Scale Questions

Note: Before continuing, you must promise that upon reading the rest of this blog post you read the following two blog posts:

http://www.datarevelations.com/the-likert-question-question.html

http://www.datarevelations.com/likert-scales-the-final-word.html

You promise?

Okay, since you promised…

As with the previous two question types, the first things I need to do is place “Question” on the filters shelf and select the group of questions we want to analyze in this visualization:

Using a filter to select the Likert scale questions

Now I’ll create a stacked bar chart, by placing “Question” on the Rows shelf, CNTD(ID) on the Columns shelf, and response on the Color shelf.

First pass at Likert Scale viz

Next I need to replace the CNTD(ID) measure with a table calculation that shows the percent of the total based on Response.

Adding percent of total to the viz

Since the questions are sorted in alphabetical order it’s hard to gauge which questions garner the greatest amount of agreement.  This is where using the Likert scores (4 for “Very high degree”, 3 for “High degree”, etc.) can help add clarity.

While I go into much greater detail in the two blog posts YOU PROMISED TO READ, I’ll walk you through how to calculate the average Likert score and superimpose if atop the stacked bars.

In our example, the value I need is the first character in the Response string, so we just need to grab the left-most character and convert it into a number.  I can do that using the following calculated field called LikertValue.

Float(Left([Response],1))

Next, I’ll place the Average of this calculated field on the Columns shelf next to the existing table calculation (don’t worry that the result will not look good.)  I now need to indicate that we have multiple marks and make the first chart a Bar chart and the second a Circle chart, as shown here.

As is often the case, as you first make improvements you can get some wacky results

This still looks pretty dumb, but if I remove “Response” from the Color shelf, then right-click AVG(LikertValue) on the columns shelf and select “Dual Axis” we get something that starts to be useful.

Getting close…

Finally, if I show labels for the circle chart and sort the questions by Average LikertValue we get something that tells a clear story.

Not too shabby

Conclusion

In this and the accompanying blog post we’ve explored how to reshape survey data and visualize three different types of survey question.  Granted, there are other question types (e.g., “add up to 100”, “rank these in order”, etc.) as well as intra-question analysis (e.g., create a scatterplot showing respondent’s age vs. annual income) that are a bit trickier to implement, but what we’ve delved into here should cover a wide variety of needs.

Question – any interest in a one or two day course devoted to visualizing survey data in Tableau?

 Posted by on January 11, 2012 2) Visualizing Survey Data, Blog 52 Responses »
Jan 032012
 

Note: This blog post is rather dated. While it does stress the importance of reshaping the data it only shows how to do this using the Tableau add-in for Excel and doesn’t discuss the optimal format for survey data. The post also doesn’t address how to combine survey results downloaded in a numeric format with survey results in a text format. 

Please visit the main Visualizing Survey Data page for more up-to-date information.

Overview

In my teaching and consulting practice I’ve fielded a lot of inquiries on how to use Tableau to analyze and visualize survey data so I thought it would make sense to write a blog post or two on the subject.

The good news is that Tableau is a fantastic tool for gleaning insights into survey data.  The bad news – and it really isn’t that bad – is that it may not be obvious as to how you go about setting up your survey data so that you can do the cool stuff with Tableau.

In this first blog post we’ll discuss reshaping “typical” survey data so that it gets along nicely with Tableau.  In the accompany post we’ll see how to visualize different question types (e.g., check-all-that apply, Likert scale, etc).

Click here to download the sample survey data.

Some typical survey data

The screen shot below shows several rows and columns from a CSV file that might have come from one of several different survey tools (e.g., “Survey Monkey”, “Zoomerang”, etc.)

Raw survey data in Excel

You’ll notice that in addition to an ID column there’s one column for each question and one row for each response.  We’re going to need to change this is moment, but first let’s understand some of the different types of questions we’ll be visualizing.

Demographic Questions

Columns B through D contain what I call “demographic” questions.  These are the questions that usually come at the beginning of the survey and let us know something about the survey respondents.  In this example we’ve just gathered gender, country of origin, and generation, but in your surveys you may see columns for ethnicity, education, marriage status, income, etc.

“Demographic” questions

Note: If you are using Excel and the Tableau reshaping add-in (which we will explore in a moment) then you should make sure the demographic questions are listed first in your spreadsheet, even if they were presented to survey takers at the end of the survey.

Yes / No / Maybe question

The data in column E, below…

Simple Yes/No/Maybe question

… represents the data that was gathered from this survey question.

Note that the cells in the Excel spreadsheet combine both numeric values (0, 1, or 2) and labels (“No”, “Yes”, “Don’t know”.)  Once we get the data into Tableau we’ll need to do some string arithmetic to address this.

Check All That Apply questions

The data in columns F through N shown below…

Check-all-that-apply question data

… represents the data that was gathered from this “check all that apply” survey question.

Notice that the only options for these columns are “1: Yes” and  “0: No”. A blank indicates that the survey respondent did not answer the question.

Likert Scale Questions

Columns O through Z …

Likert scale question data

… contain Likert Scale data that was gathered from the following survey question.

Here each response receives both a label “Very high degree”, “High degree”, etc., and a number rating from 1 to 5.  It was Rensis Likert (pronounced “Lick-ert”) that developed the idea of assigning quantitative values to qualitative measures.  If you’re creating surveys, chances are you will be doing a lot of work with Likert scale questions.

What’s wrong with the way the data is configured?

There are two very compelling reasons to reshape the data before connecting to it with Tableau:

  1. It’s way easier to explore the data and build visualizations with reshaped data (trust me); and,
  2. You won’t be stymied by a 255 column limit when connecting to an Excel or Access data source (and with long surveys it’s very easy to exceed 255 columns worth of data).

Reshaping the Data

There are a variety of data transformation / extract-transform-load (ETL) tools available including a free and promising system from Stanford University called Data Wrangler.  As our data is in Excel, I’m going to use Tableau’s free reshaping add-in to transform the data into the format we need it to be in.

Note: You can find this add-in at http://kb.tableausoftware.com/articles/knowledgebase/addin-reshaping-data-excel.

Let’s walk through the process using the sample data in Excel 2010.  Note that I have already downloaded and installed the Tableau reshaping add-in.

  1. From the Add-In menu, select Reshape. The Reshape Data dialog box will appear.
  2. Select the cell that demarks where the data should be reshaped in this case the first row of the first column that contains question, rather than demographic, data.

  3. Click OK.  The add-in will generate a new tab in your Excel workbook that contains data that looks like this:

    Reshaped data

    Notice that there is a separate row for each question in the survey and that the ID, Gender, Country, and Generation data is repeated for each survey respondent.  This is what is called “normalized” data and it is a good thing.

  4. Change the field names for Col5 to “Question” and Col6 to “Response”.
  5. Save your work and note the name of the tab that contains the reshaped data as this is the source we will use in Tableau.

So, now the data is all dressed up.  In the next blog post we’ll give it somewhere to go.

Note: When I prepare survey data for Tableau I will usually employ a small variation to the approach discussed above.  Specifically, I will create two separate tables, one called “Demographics” and the other called “Questions” and connect them with an inner join (I do this whether the source is Excel or a “real” database).  In this case, the “Demographics” table just contains one row for each respondent and the “Questions” table contains one row for each question.  I join the tables on the respondent ID.

Here’s what the Demographics table looks like:

Demographics table

And this is what the reshaped Questions table looks like:

Reshaped Questions table

Note that both this approach and the one we looked at earlier produce almost identical results in Tableau.

Coming soon: how to visualize the reshaped data.

 Posted by on January 3, 2012 2) Visualizing Survey Data, Blog 33 Responses »