Jan 032012
 

Note: This blog post is rather dated. While it does stress the importance of reshaping the data it only shows how to do this using the Tableau add-in for Excel and doesn’t discuss the optimal format for survey data. The post also doesn’t address how to combine survey results downloaded in a numeric format with survey results in a text format. 

Please visit the main Visualizing Survey Data page for more up-to-date information.

Overview

In my teaching and consulting practice I’ve fielded a lot of inquiries on how to use Tableau to analyze and visualize survey data so I thought it would make sense to write a blog post or two on the subject.

The good news is that Tableau is a fantastic tool for gleaning insights into survey data.  The bad news – and it really isn’t that bad – is that it may not be obvious as to how you go about setting up your survey data so that you can do the cool stuff with Tableau.

In this first blog post we’ll discuss reshaping “typical” survey data so that it gets along nicely with Tableau.  In the accompany post we’ll see how to visualize different question types (e.g., check-all-that apply, Likert scale, etc).

Click here to download the sample survey data.

Some typical survey data

The screen shot below shows several rows and columns from a CSV file that might have come from one of several different survey tools (e.g., “Survey Monkey”, “Zoomerang”, etc.)

Raw survey data in Excel

You’ll notice that in addition to an ID column there’s one column for each question and one row for each response.  We’re going to need to change this is moment, but first let’s understand some of the different types of questions we’ll be visualizing.

Demographic Questions

Columns B through D contain what I call “demographic” questions.  These are the questions that usually come at the beginning of the survey and let us know something about the survey respondents.  In this example we’ve just gathered gender, country of origin, and generation, but in your surveys you may see columns for ethnicity, education, marriage status, income, etc.

“Demographic” questions

Note: If you are using Excel and the Tableau reshaping add-in (which we will explore in a moment) then you should make sure the demographic questions are listed first in your spreadsheet, even if they were presented to survey takers at the end of the survey.

Yes / No / Maybe question

The data in column E, below…

Simple Yes/No/Maybe question

… represents the data that was gathered from this survey question.

Note that the cells in the Excel spreadsheet combine both numeric values (0, 1, or 2) and labels (“No”, “Yes”, “Don’t know”.)  Once we get the data into Tableau we’ll need to do some string arithmetic to address this.

Check All That Apply questions

The data in columns F through N shown below…

Check-all-that-apply question data

… represents the data that was gathered from this “check all that apply” survey question.

Notice that the only options for these columns are “1: Yes” and  “0: No”. A blank indicates that the survey respondent did not answer the question.

Likert Scale Questions

Columns O through Z …

Likert scale question data

… contain Likert Scale data that was gathered from the following survey question.

Here each response receives both a label “Very high degree”, “High degree”, etc., and a number rating from 1 to 5.  It was Rensis Likert (pronounced “Lick-ert”) that developed the idea of assigning quantitative values to qualitative measures.  If you’re creating surveys, chances are you will be doing a lot of work with Likert scale questions.

What’s wrong with the way the data is configured?

There are two very compelling reasons to reshape the data before connecting to it with Tableau:

  1. It’s way easier to explore the data and build visualizations with reshaped data (trust me); and,
  2. You won’t be stymied by a 255 column limit when connecting to an Excel or Access data source (and with long surveys it’s very easy to exceed 255 columns worth of data).

Reshaping the Data

There are a variety of data transformation / extract-transform-load (ETL) tools available including a free and promising system from Stanford University called Data Wrangler.  As our data is in Excel, I’m going to use Tableau’s free reshaping add-in to transform the data into the format we need it to be in.

Note: You can find this add-in at http://kb.tableausoftware.com/articles/knowledgebase/addin-reshaping-data-excel.

Let’s walk through the process using the sample data in Excel 2010.  Note that I have already downloaded and installed the Tableau reshaping add-in.

  1. From the Add-In menu, select Reshape. The Reshape Data dialog box will appear.
  2. Select the cell that demarks where the data should be reshaped in this case the first row of the first column that contains question, rather than demographic, data.

  3. Click OK.  The add-in will generate a new tab in your Excel workbook that contains data that looks like this:

    Reshaped data

    Notice that there is a separate row for each question in the survey and that the ID, Gender, Country, and Generation data is repeated for each survey respondent.  This is what is called “normalized” data and it is a good thing.

  4. Change the field names for Col5 to “Question” and Col6 to “Response”.
  5. Save your work and note the name of the tab that contains the reshaped data as this is the source we will use in Tableau.

So, now the data is all dressed up.  In the next blog post we’ll give it somewhere to go.

Note: When I prepare survey data for Tableau I will usually employ a small variation to the approach discussed above.  Specifically, I will create two separate tables, one called “Demographics” and the other called “Questions” and connect them with an inner join (I do this whether the source is Excel or a “real” database).  In this case, the “Demographics” table just contains one row for each respondent and the “Questions” table contains one row for each question.  I join the tables on the respondent ID.

Here’s what the Demographics table looks like:

Demographics table

And this is what the reshaped Questions table looks like:

Reshaped Questions table

Note that both this approach and the one we looked at earlier produce almost identical results in Tableau.

Coming soon: how to visualize the reshaped data.

[suffusion-the-author]

[suffusion-the-author display='description']
 Posted by on January 3, 2012 2) Visualizing Survey Data, Blog  Add comments

  33 Responses to “Using Tableau to Visualize Survey Data — Part 1”

Comments (28) Pingbacks (5)
  1. Very excited for part 2!

  2. Thank you so much! I, too, am a consultant specializing in marketing research. Trying to decide if Tableau is right for my firm. I really appreciate the help!

  3. Thanks for the great article series.  One question though:  What’s the benefit of using a separate demographics and data tables connected with an inner join?

    • Derek,

      With survey data you want some of the data to be reshaped (the questions) but the demographics portion should remain in the one row for each respondent That said, you could just put all the demographic stuff in the first few columns and then use Tableau’s reshaping tool and reshape starting at the first question. This produces the same end result of joining the demographics and question tables. Also, with a more sophisticated type of system you may have a panel of respondents and you already know their demographic data (verses having to ask them every time they take the survey). This is another case where you will want to do the join.

      Steve

      • Hi Steve! I discovered your blog posts and wisdom a few months ago and started applying your methods to my survey. It has worked wonder so far; however, I’m at a point where I need to decided whether to reshape all data (including demo) or keeping demo stuff separate. From your answer above, it doesn’t sound like there’s an apparent benefits to keeping demographic stuff separate, does it? If I’m wrong, please educate me on the pros & cons of having a separate demo table versus reshaping all questions.

  4. If I have questions which are likert/yes-no-maybe and ones where I am asking for a number – “What is the average hours per week spent fishing?” for example, would your preference to treat the latter as demographic and not include them in reshaping, or to reshape them?

    • Alex, I would reshape these questions and not treat them as so-called demographic questions. My guess is that you’ll end up binning the responses in some way.

      • Hi Steve-
        I have similar type of questions as Alex mentions above where people respond with a # (# of people affected, # of employees involved). Can you elaborate on how you would use bins with the pivoted data?

  5. Thanks for the great posts. I’ve been using Tableau (on a Trial) to analyse online survey data. I quickly produced visualisations from the demographic questions but was stumped by questions using a Likert scale (and the Net Promoter Score question).

    So thanks for helping me to understand what will be involved when we buy the product and I get to work for real.
     
    So just a quick check…has the release of Tableau 8 resulted in any significant change to the approach?  

    • Helen,

      No significant change with Tableau 8. It’s a matter of getting the data set up so that you can do cool things (e.g., Likert, Net Promotoer, etc.)

      Steve

  6. Steve,
    Thanks very much for your prompt reply. We finally have our licence and are ready to go.

    I’m a Newbie and I want to start out using best practice. My colleague has used Tableau before and I can’t convince him about re-shaping.  He says ‘it creates a lot of noise inside the exel workbook and it will become very hard to maintain” (e.g. find errors).

    The argument that I’m finding it hardest to counter is volume. He says “just 3-4 separate questions reshaped, generated 20+ rows for each ID/respondent, so imagine how many rows it would result in if we’re to reshape our monthly 1,000-3,000 responses for each client…. 400,000 rows for one month of survey respondents for each client…  So to me, Reshaping would work well smaller data sets and is intended for more in-depth analysis.”

    I accept that we have a large volume of data – the surveys run continuously and we also aggregate client data to create a benchmark group. 

    I thought I saw a comment from you about Tableau’s ability to handle huge volumes of data. Could you please comment or point me to the relevant post please?  

    • Helen,

      < <400,000 rows for one month of survey respondents for each client…>>

      And what is wrong with that? Even 4M rows is not a problem.

      Tableau will fly through this. You will have a hard time maintaining a database with volume like that in Excel, but if you have dozens of clients and thousands of respondents then you really shouldn’t be maintaining your data in Excel spreadsheets.

      BTW, if you end up generating more than 1.6M rows of data using the reshaping tool, Excel / Tableau will automatically generate a CSV file (and Tableau can connect to the resulting CSV file without a problem). That said, I would highly recommend that you maintain your data in a database at this point (even Microsoft Access will do).

      One of my clients is using Alteryx with success to reshape / transform the data and simultaneously output to both a TDE file (Tableau Data Extract file) and to SQL Server. I plan to devote some time to working with this tool to isolate the best practices in visualizing survey data.

      Steve

      • We just picked up an Alteryx license, and I am attempting to use it to reshape the data (survey responses from QuestionPro), and have it output as reshaped data for Tableau. While helpful for us, QuestionPro raw data output isn’t exactly the cleanest from a database perspective, so a lot of work to do to get it in a decent format.
        If there is a way to perform a reshape on the data in Alteryx, similar to how you would do so in the Tableau Excel Add-In, so that we could then go directly from Alteryx to Tableau, that would be awesome. If you have any insights on this, they would of course be appreciated.
        Devin

        • Devin,

          I’m very impressed with both Alteryx and the people behind it (I’ve recommended it to my clients). I do not yet use the product myself (I hope the remedy this) but my impression is that it should be quite simple to do the ETL stuff you would like to do.

          Steve

  7. Thanks for excellent ammunition Steve!
    Very clear advice.
    I will fight on…

  8. Just a small comment: the data in the second table of the section “Reshaping the Data” is denormalized, not normalized as you say. Normalized data would be stored non-redundantly, i.e. survey respondent gender would be stored once in a separate table instead of being repeated on each row.

    However, the principle is right, denormalizing data for analysis and visualization purposes as you do is the way to go.

  9. Thank you for great article.
    Unfortunately, I encountered problems while using your method in case of big amount of data and more than one wave of measurement.

    I have to deal with quite large amount of data (now I have 804 observations and 207 questions (if we count each part of multi answer question as a separate question). Vast majority of my qestions are multianswerquestions and I need to reshape them. I cannot use your method (reshaping all MA questions at one ) because Excel hangs. Thus, I need to divide them into seperate spreadsheet and then connect them in Tableau by ID. Unfortunately when I try to upload all my data to Tableau the program also hangs. Furthermore, I will have next waves of the research and I think it will be very difficult to match them after reshaping with previous data.
    Do you have any idea how can I solve these problems? I’ve been looking for the solution but I cannot find any.

    • Agnieska,

      I wonder if the problem may be something with your computer as that’s really not that much data. What version of Windows / Excel are you using? Do realize that if in your reshaping you exceed 1M rows the reshaping tool will generate a .CSV file. A few possible approaches:

      1) Send me the source data and I will try to replicate the problem.
      2) Divide your survey into two parts but do NOT join them. Just have one data source that contains the first half of your questions and a second data source that comprises the second half of the questions.
      3) Use a *real* ETL tool, such as Alteryx (which is what I now use) to reshape the data.

      Steve

  10. Steve,
    thank you very much for quick answer and good solution.
    Unfortunately, I cannot send you my data because it is confidential but I have tried solution nr 2 and it works . When I was joining sheets by id, Tableau multiplied each question by each question and generated million rows what caused problems with loading data.
    However my problem with data is not totally solved yet – I’m afraid that when I will try to reshape data from next wave it could be difficult to achieve perfect match with previously prepared data sheet. I understand that I should use ETL tool – could you recommend some free ETL tool that would be convienient for Tableau data?
    Agnieszka

  11. Sorry, I’ve just noticed that you have recommend one already

  12. I believe Tableau 9 will have reshaping ability built-in.

    another very nice tool is R’s tidyr package and it’s gather() method.

    Super easy and totally configurable to reshape any columns in your data frame:
    gather()
    Collapses multiple columns into two columns:
    1. a key column that contains the former column names
    2. a value column that contains the former column cells

    example:
    gather(cases, “year”, “n”, 2:4)

    this takes your cases data frame, and converts the contents of columns 2 through 4 into 2 new columns, “year” gets the old data frame’s column’s 2 through 4’s headings, and “n” gets the content that were in columns 2 through 4.

    blog post on it http://www.r-bloggers.com/introducing-tidyr/

  13. Hello Steve,

    Seems to be a perfect solution, guess I am on the right page. I am working on similar kind of project at present, I am using Alteryx to reshape my survey data before moving it to Tableau for visualisation. But the problem here is, my data is very big that it has 9000 columns in it(response for the questions) with 70k rows in it. It also includes few key columns in the starting which is gender,country,demographics,ethnicity,education,marriage status,etc. I did a transpose in alteryx by keeping the key columns as it is and transpose the response columns alone. But it became huge that the 2GB file has been converted to 22GB file. I guess its because of the number of columns i am handling.

    Could you throw some lights on it to handle these kind of files.

    Thanks

    • 9,000 columns? That is A LOT of questions for people to respond to!

      This is unchartered territory for me, but why not break this up into multiple data sources? That is keep the same demographic data, but break the survey up into different parts. If you need to do any intra-question analysis I think there’s a good chance that a blend will work.

      BTW, I now use Alteryx for all of my reshaping and joining needs.

  14. Hi
    Thanks for sharing the great posts. We have been using Tableau (on a Trial) to review online survey data. We quickly produced visualisations from the demographic questions but was stumped by questions by using a Likert scale (and the Net Promoter Rating question).

    So thanks for helping me to understand what will be included once we buy the product and I get to work for real.

  15. Hi Steve,
    I am using the tutorial to analyze the survey data of my organization. This has proven really useful to me. However, I ran into a problem. I was not able to get accurate results when I used weights for a grouped data. If I use unweighted results, it works fine. This is the formula i use to create groups:

    if [Question]=’Sheepgoats’ then ‘Sheep/Goat’
    ELSEIF [Question] = ‘Yakcoebuffaloes’ then ‘Yak/Cow/Buffalo’
    ELSEIF [Question] = ‘Poultry’ then ‘Poultry’
    END

    Following is the formula to produce weighted outputs

    Sum([Responses_NUMERIC]*[weight_adj])/TOTAL(Sum([Responses_NUMERIC]*[weight_adj]))

    thanks
    Abesh

Leave a Reply to Kelsey Cancel reply

(required)

(required)