Nov 282016
 

Note: Major thanks to Nazirah Garrison and Christie Clark at Tableau for suggesting this approach.

Overview

With Tableau 10.x it is in fact possible to get your survey data, “just so” without having to invest in new tools and / or a engage in a time-consuming, error-prone procedure every time you need receive updated survey data.

There’s a lot of upside to this approach — everything is built into Tableau and you’ll just need to refresh the extracts (there will be several of them) as you get new data.  The downside is the setup is a little bit cumbersome and Tableau manifests some “hmm, that doesn’t seem right” behavior along the way.

Before embarking on this I encourage you read this post so you understand what I mean when I refer to the data being “just so.”

Also, if you would like to follow along you can download the source data here.

Three files and three extracts

Anyone that has read my posts or attended my classes know that I want survey responses in both text and numeric formats. Sure, you can in fact manage with one format or the other, but you’re just creating a LOT more work for yourself if you don’t have the data in both formats.

You will also need what I call a “Helper” file — this is just a separate file that maps each question ID into human readable form and groups related questions together. Again, you can certainly get by without it but you’ll be working much harder than you need to, especially if you ever compare “Importance” with “Satisfaction” Likert-scale questions.

For this example we will create a separate extract for each of these three files, then use Tableau 10.x’s ability to join three different data sources.

Let’s start by creating the extracts.

To pivot the data labels and create an extract

  1. Start Tableau 10.x and indicate you want to connect to Excel.
  2. Connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx and drag the sheet Data Labels into the Drag sheets here area as shown below.
    01_draglabels
  3. Leave the first five columns intact (the Resp ID, demographic stuff, and Weight) and select all the other columns.
  4. Click any of the selected columns and select Pivot from the context menu.
    02_firstpivot
  5. Rename the first pivoted column Question ID and the second column Label as shown below.
    03_rename
  6. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  7. When asked to save, name the file DR_JustSo_Labels.tde (make sure to note where you are saving the file).

One down, two to go.

To pivot the data numbers and create an extract

  1. Click the New Data Source icon and indicate you want to connect to an Excel file.
  2. As before, connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx, but this time drag Data Numbers to the Drag sheet here area.
  3. Hide the columns labeled Gender, Location, Generation, and Weight — we already have them in the other data source and don’t need them twice.
  4. Leaving Resp ID in place, select the second through the last columns.
  5. Click the down arrow on any of the selected columns and select Pivot from the context menu.
  6. Rename the first pivoted column Question ID and the second column Value, as shown below.
    04_secondpivot
  7. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  8. When asked to save, name the file DR_JustSo_Numbers.tde.

Two down, one to go.

To create the question helper extract

  1. Click the New Data Source icon and indicate you want to connect to an Excel file.
  2. As before, connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx, but this time drag Question Helper to the Drag sheet here area.
  3. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  4. When asked to save, name the file DR_QuestionHelper.tde.

All three data sources are now ready.

Joining the three data sources together

We now have our three data sources as separate Tableau extract files. We’ll combine these three files (and create an extract from the joined files) using Tableau 10.x’s ability to join files from different data sources.

Note: This is where we’ll encounter Tableau’s “head-scratching” behavior.

To join the three data sources

Note: In first trying this Tableau presented a lot of warning messages about not being able to materialize a temporary table. While I could ignore these warnings and muddle through, you may not be so lucky. It turns out the culprit was my anti-virus software. I temporarily disabled it and everything worked without a hitch.  See http://kb.tableau.com/articles/issue/error-unable-to-materialize-temporary-table-joining-data-sources.

  1. Click the New Data Source icon and indicate you want to connect to More, as shown below.
    05_more
  2. Select DR_JustSo_Labels.tde and click Open.
  3. Click Add, as shown below.
    06_add
  4. Click More and then select DR_JustSo_Numbers.tde. Do not be fooled, the correct fields have NOT yet been joined.
  5. Click the overlapping Venn diagram to display the Join dialog box, as shown below.
    07_join1
  6. Click Number of Records and then click the X that appears in the row to indicate you do NOT want to join these two data sources using this field.
  7. From the left data source indicate you want to join using Resp ID, as shown below.
    08_join2
  8. From the right data source indicate that you want to join using RespID (Extract1). I have no idea why the field is named this way. More on this in a moment.
  9. From the left data source indicate that you want to join using Question ID. Yes, you need to join on more than one field.
  10. From the right data source indicate you want to join using Pivot Field Names (Extract1). Okay… THIS is the thing that has me scratching my head and as of this writing (November 27, 2016) I have no idea why the field isn’t also called Question ID.

    January 4, 2017 — Now I know why this is happening. When using data extracts (.TDE files) Tableau is only able to keep track of the alias names for fields in the first .TDE file.  All of that info gets stripped out from the second .TDE file and Tableau just sees the original field name (Pivot Field Names). A little off-putting, yes, easy to address and it won’t matter a lick once we’re building our visualizations.

  11. Click Add again, select More, and select DR_QuestionHelper.tde.
  12. Click the second Venn overlapping circle and remove any of the joins that may be in there (most likely again using Number of Records).
  13. From the left data source select Question ID and from the right data source select QuestionID as shown below.
    09_join3
  14. Hide the fields RespID (Extract1), both Number of Records fields, the second Question ID, and Pivot Field Names, and rename Pivot Field Values to Value. Your screen should look like this.
    10_joinall
  15. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  16. When asked to save, name the file DR_JustSoAll.tde. After creating the extract Tableau will show all the field names grouped by data source, as shown below.
    11_datasource
  17. Click the down arrow and select Group by Folder.
    Okay, you don’t have to do this but I see no reason to group the fields by data source.

Congratulations, you now have your data “just so” and you did it all in Tableau.

So, how do you get the extracts to refresh?

Good question.  If you are using either Tableau Server or Online you can create shared data sources and program the extract to refresh on a regular or as-needed basis.

If you are using desktop — and you have all four data sources in in one workbook — you can just click the  Data menu and select Refresh All Extracts.

Conclusion

While I find the process outlined here both cumbersome and confusing (what is up with those renamed fields NOT staying renamed?) this approach does appear to work and you only need to set it up once. The same cannot be said of The Tableau add-in for Excel which requires a lot of manual intervention every time you want to update the data.

Will this replace Alteryx as my tool of choice? No, but it does work and you can’t beat the price.

[suffusion-the-author]

[suffusion-the-author display='description']
 Posted by on November 28, 2016 2) Visualizing Survey Data, Blog Tagged with: , , , ,  Add comments

  41 Responses to “Getting Survey Data “Just So” using Tableau 10.x”

Comments (41)
  1. Hey Steve, just discovering your special brand of genius and willingness to share – thanks!

    I have done all of the above and had a look around, but haven’t seen any detail on an importance v satisfaction scatterplot which maps two Likert scale questions together. I’d like to work with the data the way I have it, so whilst I have found some other links around Tableau resources was wondering if you had any guidance on this? Am relatively new to Tableau… but learning!

    • Andrew,

      I presented on this topic at the Tableau conference and hope to have something about it up on the site next week.

      Feel free to nag me if you don’t see it soon.

      Steve

  2. Hi Steve, thanks for explaining this so good that even a newbie in Tableau like me can understand it.
    I have followed your instructions using the file you posted and everything works as you say, then I have tried the same steps using my dummy data, ( from Confirmit) and it doesn’t group the question IDs with the wording at all.
    I can only think the problem relies on the format of my helper file, but I am lost trying to find out what is wrong with the format. Any suggestion from your side?

    • Patricia,

      One thing that may have happened is that Tableau’s data interpreter may have renamed some of the question IDs and they no longer map to the helper file. I’ve seen this with my own stuff where

      Q34_SAT

      gets changed to

      Q34_Sat

      Indeed, that happens in the example I have but Tableau is able to do the cross data source join correctly. If you want we can set up a screen-sharing session and I can try to diagnose what is going on.

      Steve

  3. OMG, you are right!, It was in front of my eyes all the time and I just didn’t see it.
    Tableau is changing the column names from something like this:
    “q4_1,q4_2,q4_3,q4_4, q4_5”
    to “Q4 1, Q4 2, Q4 3, Q4 5”
    but in my case it is not able to join correctly
    Any suggestion how to stop Tableau from doing this?
    I have Tableau Dektop 9 installed (upgrading to 10 in two weeks, but not earlier as need to align with the server version to publish..)
    Due to different time zones a screensharing session could be hard to arrange, though I would love it, would probably be faster..
    Thanks so much for your valuable feedback and help

    • Patricia,

      You can undo the pivot, select all the fields you planned to pivot, then tell Tableau to Reset Name. Then do the pivot.

      I have no idea why Tableau does this with the names. It’s tripped up a lot of people.

      Steve

  4. Peter, I have a question, I am stuck with this survey data, which actually is a company scorecard to meassure different certification level for a company program.
    The countries are answering the questions (which are milestones they need to do) via survey (Confirmit) and depending on the answers they give 1 or 0 (Y/N) they reach a specific level of certification.
    I have two files the raw data and the helper file.
    I need to build a condition that goes like this:

    if answer to Q1 1 and Q1 2 and Q1 3 and Q1 4 all equal Y then certification Silver
    if answer to Q1 1 and Q1 2 and Q1 3 and Q1 4 and Q2 1 and Q2 2 and Q2 3 all equal Y then certification Gold
    and so on..

    The thing is that if I separate the condition like this:
    if answer to Q1 1 equal Y then certification Silver
    it works, but when adding all conditions together with AND it is too restrictive and the calculated field is not working.
    If I add them together with OR, the rule works only partially, and of course it is not correct.

    Do you have a suggestion about how to work with theses conditions?
    I have tried to give the answers a number value, but that doesn’t work either..

    I appreciate your help, thanks so much
    Patricia

    • Patricia,

      First, it’s Steve, not Peter… 😉

      This is a cool question and I can’t really sit down with it until next week. If you need a response immediately, try posting to the Tableau forum (and in particular, the survey group on the Tableau forum).

      I can think of several ways to address this. I would almost certainly go with numeric responses and not text responses. For example, for the four yes/no questions that comprise silver status you could just have a calc such as

      SUM ([Value])

      And you would be looking for anything equal to 4.

      I will try to get to this, but not until next week. If you get it addressed before then, please let me know.

      Steve

  5. Thanks Steve (sorry about the name change, there is a Peter Wexler in the company I work for and got confused)
    I will do both things you suggested, post question on Tableau Forum and try the numeric responses.
    There has to be a solution for this.

    • Steve,

      I have a variation on this question. In my case we have three question categories used for scoring three separate metrics. Each metric considers the response to a subset of the questions. Each surveyed response then is given a score in each metric. Now suppose that each question and each metric also has a weight with it. The score for the metric is then given by the sum of each (metric specific) weight multiplied by corresponding response.

      What are your thoughts about how to set this up?

      Thanks,
      Steve

      • Steve,

        Are you trying to create a weighted index? That is, you might ask 8 questions and come up with an index based on how people responded to the 8 questions, where some of the 8 questions should be weighted more than others.

        If “yes” then I know how to do this (and just need to blog about it). If not, then I need some clarification of what you’re trying to do.

        Steve

  6. Hi Steve

    Can I ask a quick question please?

    My value field after creating the three blended datasources file is now a string variable. If I try to run a calculation such as ‘computing the percentage of people that selected an option’ then it doesn’t work. If I convert the variable to a decimal whole (no float option) then it replaces some with null values.

    Have you come across this issue and if so how did you deal with it please?

  7. The value is default as string and ‘other’ text responses show as 0 whereas when I change to whole decimal the others become nulls but if I scroll down the values are there! Phew.

  8. Hi Steve,

    Finally, after working at this place for over a year, they’ve given me some survey data. To get ready to work on my own data, I followed your tutorial several times, and when something didn’t work it was always because I didn’t follow the instructions closely enough. You anticipated everything.

    One thing I’m wondering: When I finish and try to refresh my extracts, I get “Please enter your password to connect to the data source.” I’m on Desktop, connecting to your spreadsheet that I downloaded, and I’ve never before been asked for a password to connect to an Excel file. Do you have a clue as to what might be behind this request?

    The refresh window is chugging away and doesn’t look like it will finish refreshing during my lifetime.

    Thanks, Steve.

    Dan

  9. Steve,

    I am new to Tableau and have enjoyed learning about survey analyses through your online material. I really like your divergent stacked bar chart for sentiment items (specifically on p. 6 of the “Visualizing Survey Data” whitepaper), but cannot find specific directions for how to set this up. Have you or someone else posted detailed instructions on how to create this?

    Thanks,
    Brandon

  10. Hi Steve

    Hope you had a good Christmas. I now have a survey set up as per the white paper. If I want to cross-tabulate more than one question then how do I do that please? Where those questions are set up as demographic dimensions then it is easy – but just in case I need to cross tab questions with each other rather than demographic dimensions.

    Any help much appreciated.

    Thanks
    Colin

    • Colin,

      Sorry for the late response.

      Intra-question analysis (where you cut or filter a question by another question) is something I’ve been looking to revisit since writing about it several years ago.

      Here’s a link the the older post. There is some useful stuff in it:
      http://www.datarevelations.com/using-tableau-to-visualize-survey-data-part-2-%C2%BD.html

      Also see http://www.datarevelations.com/likert-vs-likert-on-a-scatterplot.html

      Steve

      • Hi Steve

        Thanks again for posting this link. I had not seen it when I posted below. I have had some success with this approach. However as my survey is 44 questions then when I try to blend the re-shaped file it generated over 5m rows of data and after 30 minutes I gave up! I tried to filter the extract but it still brings all the data along.

        So I have simply blended the questions I may need rather than the whole dataset which works. The only tip for others there is to blend on a ‘left join’ (so will still show all the original records and only match re-shaped records if record ID & question ID match) as opposed to an ‘inner join’ as you would for blending two complete datasets.

        There are other ways to perform intra question analysis using calculated fields or visual cross tab using un-pivoted measures but this method seems most straight forward if you have already got your data into the pivoted shape.

        Kind regards
        Colin

  11. Hi Steve,

    Thanks for writing all of these great articles, they’ve been really helpful so far.

    I’m working with Survey Monkey data and I’m trying to get it “Just So”. The problem is the way it exports, it uses 2 rows for the questions and uses multiple columns for each possible answer. See below for an example. I was wondering if you had any tips for getting data like this just so?

    e.g.

    Row 1: Select your favourite drink
    Row 2. water | juice | milk | beer
    Row 3: | juice | |
    Row 4: | milk |
    Row 5: water | | |
    Row 6: | juice| |

    Thanks for you help,
    Jason

    • Jason,

      For this type of question is’s easy to have 1 and 0s (or 1s and blanks) then the term. I think SurveyMonkey will allow you to do this, but if it doesn’t you can just set up the data as follows.

      Have the the column for each of the choices in question look like this:

      Q_Water Q_Juice Q_Milk (etc.)

      Then underneath each of these columns have the entries from the survey. Then pivot the data. You’ll get something that looks like this:

      RespID QuestionID Response
      101 Q_Water Water
      101 Q_Juice [null]
      101 Q_Milk Milk
      105 Q_Water Water

      I hope this makes sense.

      Steve

  12. Steve, Thanks for the quick response!

    Sorry, but I don’t quite follow. To make it more clear I’ve uploaded a screenshot here: http://screencast.com/t/8FjWxHwFI

    You see the problem is that SurveyMonkey is exporting all questions in 2 rows, with the question in the first row and all possible answers for the questions in individual columns in the second row.
    I’ve also exported a labels and numeric output and in comparing the data some get a weird conversion. e.g. a label value of “walmart” gets converted to “15” on the numeric output.

    Q. If I now have the following columns: Q1_Water, Q1_Juice, Q1_Milk.
    Wouldn’t they be treated as separate questions? How would it know that these are all answers to the same question thus allowing me to report on the data? e.g. Q1 – What’s your favourite drink?

    The only thing I can think of is, grouping them in the helper

    QuestionID | Wording | Question Grouping
    Q1_Water | Water | Favourite Drink
    Q1_Juice | Juice | Favourite Drink
    Q1_Milk | Milk | Favourite Drink

    But even then I’m not sure how that would help.

    Thanks in advance for your help!

    • Sorry to spam this page, but in case other readers are interested or in case this is wrong and you want to correct me 😉 I’ve found the “Merge Mismatched Fields” function and am in the process of trying that. It’s taking a while because each action takes at least 2 minutes to execute…

      thanks again!

  13. Hi Steve and others

    I have finally got this to work and produced some hopefully insightful output! My main challenge was that I was only provided with a number file so had to create a text version manually due to limitations with the survey tool. This took me quite a while including validation using Excel for Mac. I think a tool like Alteryx would really help but as you say is prohibitively expensive – can’t wait to see what Tableau’s own Project Maestro can deliver. Top tip – I spent quite a bit of time making sure my number file matches my label file exactly.

    Once the files are set up ‘Just So’ and blended in Tableau 10 it works really well. I can drag and drop my dimensions onto the view and filter by question grouping. It did take me a while to work out the different calculations required for ‘check all versus’ please ‘tick one option’ (Likert scale) type questions. Once this was worked out then it is straight forward to duplicate. Top tip – download Steve’s survey workbook from 2014 and look at the first few tab captions.

    More tips – one neat trick is that if the data does not look right then to filter at datasource and look in detail and what the issue is likely to be. Quite often for me it was that those pesky labels were slightly wrong as I used some calculated fields to re-label them in Tableau.

    I am still not sure what to do if I get asked to produce off spec cross tabs though – I have some test calculated fields using contain function which Tableau recommended so may try those or maybe a visual cross tab using an un-pivoted version and question measures?

    Thanks again – although this first time it took me a while, once I had shaped my data it worked really well and saved me a lot of time.

    Regards
    Colin

    • Colin,

      I just published a post that explores a technique I use to see the universe of all questions and all possible responses. It makes it very easy to see where there is miscoded data.

      See http://www.datarevelations.com/secondthing.html

      Steve

      • Hi Steve

        Thanks for posting this link, it is a quick and easy way to double check the numbers and labels. It beats going through the excel version and running CountIF formulas to total the labels and comparing to Tableau charts!

        Kind regards
        Colin

  14. Steve,

    Thank you for the detailed, step-by-step directions!

    Question about the demographics: don’t we want to reshape those to the tall format as well so that they can be used in the same way as your Question Grouping and Question Wording fields? Rather than having four columns for four demographic questions, for example, would it be better to pivot the demographics with the survey questions in order to get a segmented view of results by demographics?

    Your thoughts would be much appreciated.

    Thanks,
    Katie

    • Katie,

      It depends. I have presented on doing what I call a “double-pivot” but have not blogged about it yet. See https://www.youtube.com/watch?v=nmr_-1aL1T4&t=205s

      where I discuss a way to combine all the demographics.

      Steve

      • Great, that is exactly what I needed! Specifically, the piece where you discuss filtering on one demographic to avoid over-counting the data for the overall results. I had done the double pivot and moved on to creating a divergent stacked bar chart but found that my Count Negative field was being multiplied by four and had no idea why – now I know that Tableau was multiplying by each of my four demographic variables. Thanks so much, Steve!!!

        Katie

  15. Hi Steve

    I have just realised something after checking all my base value labels.

    I have just realised my ‘please check all that apply’ questions are % of all respondents rather than those responding just to the question. Please select one only questions are fine once I filter out null values. I have found the following blog post on this matter here: http://www.datarevelations.com/tag/check-all-that-apply

    Am I right in saying we essentially need to apply a Level of Detail calculation to filter on the Max value and remove any nulls?

    I do not have Qualtrics or Alteryx formats but rather Excel with 0 or 1 for a response. So my questions are not in Q1_ format, do I still need a question prefix?

    Kind Regards
    Colin

  16. Hi Steve

    In reference to the ‘check all that apply’ questions, I tried the following methodology; http://www.datarevelations.com/tag/check-all-that-apply

    However I am using SelectSurveys software which outputs Yes = 1 and No response = 0 for this type of question.

    So for the Check All % (Sum[Value]/Sum[Number of Records]) then the 0 values are included in Number of Records. That is it is ‘% All Respondents’ rather than ‘% All Respondents who answered the question’.

    Is there an easy way to filter out those respondents who have not answered the question and are recorded as a row of 0 values from the Number of Records?

    Any thoughts appreciated please

    Kind Regards
    Colin

    • Colin,

      With respect to the Question Prefix, you can (in Tableau 10.1) just create a group that include all the related questions and refer to that group in the calculated field. Or if you are not using 10.1 you can just create a calculated field that reads something like this:

      IF QuestionID=”FirstCheckAllQuestion” or
      QuestionID=”SecondCheckAllQuestion” or
      QuestionID=”ThirdCheckAllQuestion” then
      QuestionID
      END

      Regarding your second question… So, the tool codes a zero if the person did not answer the question at all and a zero if the did not select an option? So, it’s essentially dealing with all zeros vs. some zeros (as opposed to all Nulls vs. some Nulls).

      I think something like this will do the trick:

      ({FIXED [Question Prefix], [Resp ID]: SUM([Value])})>0

      Steve

      • Hi Steve

        Thank you. That works a treat! I created a group called ‘Q12 group’ and dropped that into the LoD calc instead of the Question Prefix and it has filtered out the all zero responses.

        Regards
        Colin

  17. Hi Steve

    Me again! Additionally is there a way to filter on a question dependent upon a previous question being answered? My survey software has not had the relevant validation put in place. I have a feeling it will be more data blending for intra question analysis? You mention there are better ways to perform intra question analysis and to nag you – are there any handy resources for this please?

    Thanks
    Colin

  18. Your dedication to the comment sections in your blogs is admirable!

    • Thanks, Matt. I see responding as being part of the deal, so to speak.

      Also, readers often have some really good ideas and I don’t know if those would come about if I didn’t respond.

  19. Hi Steve,

    your explanation helped me a lot to create proper surveys. I have only one issue I am struggling with. The questions helper and survey data are joined via inner join based on Question ID.

    As I have several ‘check-all-that-apply questions’ I have for some questions the question ID Q16-1, Q16-2, Q16-3, etc. I tried it with the method you suggested in another post: SPLIT([Question ID], “-“, 1)

    But the problem is that these calculated fields are not an option while making the inner join.

    Do you have any suggestion how I could solve this? My current solution is in the Question Helper to also have Q16-1, Q16-2 etc. instead of just Q16.

    /Marc

    • Marc,

      I think the ONLY solution is to map each of the individual questions IDs (Q16-1, Q16-2, etc.) to a human readable form using the question helper file.

      Steve

      • Hi Steve,

        thank you for your quick reply. I found today a kind of workaround. I made a SPLIT using ‘-‘ in the data source, before creating the extract. Only for Q16_Other I had to create an extra row in the Question Helper. Everything else will be shown as Q16 and I only need one row.

        Regards from Berlin,

        Marc

  20. Hi Steve. Can you please explain the joins. I didn’t quite understand those. Thanks!!

    • Alex,

      So, our goal is to get the data “just so” as outlined in a post called “getting the data ‘just so'”. We need to have three separate data sources, or tables, of data. Two are identical except that one has label responses and one has numeric. The third table (the helper file) is just something that adds an explanation layer for the data. This blog post describes the steps needed to combine these files in the way that Tableau likes.

      In this case, we get the combination by performing two joins.

      Steve

 Leave a Reply

(required)

(required)