Note: This blog post is rather dated. While it does stress the importance of reshaping the data it only shows how to do this using the Tableau add-in for Excel and doesn’t discuss the optimal format for survey data. The post also doesn’t address how to combine survey results downloaded in a numeric format with survey results in a text format.
Please visit the main Visualizing Survey Data page for more up-to-date information.
In my teaching and consulting practice I’ve fielded a lot of inquiries on how to use Tableau to analyze and visualize survey data so I thought it would make sense to write a blog post or two on the subject.
The good news is that Tableau is a fantastic tool for gleaning insights into survey data. The bad news – and it really isn’t that bad – is that it may not be obvious as to how you go about setting up your survey data so that you can do the cool stuff with Tableau.
In this first blog post we’ll discuss reshaping “typical” survey data so that it gets along nicely with Tableau. In the accompany post we’ll see how to visualize different question types (e.g., check-all-that apply, Likert scale, etc).
Click here to download the sample survey data.
Some typical survey data
The screen shot below shows several rows and columns from a CSV file that might have come from one of several different survey tools (e.g., “Survey Monkey”, “Zoomerang”, etc.)
You’ll notice that in addition to an ID column there’s one column for each question and one row for each response. We’re going to need to change this is moment, but first let’s understand some of the different types of questions we’ll be visualizing.
Columns B through D contain what I call “demographic” questions. These are the questions that usually come at the beginning of the survey and let us know something about the survey respondents. In this example we’ve just gathered gender, country of origin, and generation, but in your surveys you may see columns for ethnicity, education, marriage status, income, etc.
Note: If you are using Excel and the Tableau reshaping add-in (which we will explore in a moment) then you should make sure the demographic questions are listed first in your spreadsheet, even if they were presented to survey takers at the end of the survey.
Yes / No / Maybe question
The data in column E, below…
… represents the data that was gathered from this survey question.
Note that the cells in the Excel spreadsheet combine both numeric values (0, 1, or 2) and labels (“No”, “Yes”, “Don’t know”.) Once we get the data into Tableau we’ll need to do some string arithmetic to address this.
Check All That Apply questions
The data in columns F through N shown below…
… represents the data that was gathered from this “check all that apply” survey question.
Notice that the only options for these columns are “1: Yes” and “0: No”. A blank indicates that the survey respondent did not answer the question.
Likert Scale Questions
Columns O through Z …
… contain Likert Scale data that was gathered from the following survey question.
Here each response receives both a label “Very high degree”, “High degree”, etc., and a number rating from 1 to 5. It was Rensis Likert (pronounced “Lick-ert”) that developed the idea of assigning quantitative values to qualitative measures. If you’re creating surveys, chances are you will be doing a lot of work with Likert scale questions.
What’s wrong with the way the data is configured?
There are two very compelling reasons to reshape the data before connecting to it with Tableau:
- It’s way easier to explore the data and build visualizations with reshaped data (trust me); and,
- You won’t be stymied by a 255 column limit when connecting to an Excel or Access data source (and with long surveys it’s very easy to exceed 255 columns worth of data).
Reshaping the Data
There are a variety of data transformation / extract-transform-load (ETL) tools available including a free and promising system from Stanford University called Data Wrangler. As our data is in Excel, I’m going to use Tableau’s free reshaping add-in to transform the data into the format we need it to be in.
Note: You can find this add-in at http://kb.tableausoftware.com/articles/knowledgebase/addin-reshaping-data-excel.
Let’s walk through the process using the sample data in Excel 2010. Note that I have already downloaded and installed the Tableau reshaping add-in.
- From the Add-In menu, select Reshape. The Reshape Data dialog box will appear.
- Select the cell that demarks where the data should be reshaped in this case the first row of the first column that contains question, rather than demographic, data.
- Click OK. The add-in will generate a new tab in your Excel workbook that contains data that looks like this:
Notice that there is a separate row for each question in the survey and that the ID, Gender, Country, and Generation data is repeated for each survey respondent. This is what is called “normalized” data and it is a good thing.
- Change the field names for Col5 to “Question” and Col6 to “Response”.
- Save your work and note the name of the tab that contains the reshaped data as this is the source we will use in Tableau.
So, now the data is all dressed up. In the next blog post we’ll give it somewhere to go.
Note: When I prepare survey data for Tableau I will usually employ a small variation to the approach discussed above. Specifically, I will create two separate tables, one called “Demographics” and the other called “Questions” and connect them with an inner join (I do this whether the source is Excel or a “real” database). In this case, the “Demographics” table just contains one row for each respondent and the “Questions” table contains one row for each question. I join the tables on the respondent ID.
Here’s what the Demographics table looks like:
And this is what the reshaped Questions table looks like:
Note that both this approach and the one we looked at earlier produce almost identical results in Tableau.
Coming soon: how to visualize the reshaped data.