Nov 282016
 

Note: Major thanks to Nazirah Garrison and Christie Clark at Tableau for suggesting this approach.

Overview

With Tableau 10.x it is in fact possible to get your survey data, “just so” without having to invest in new tools and / or a engage in a time-consuming, error-prone procedure every time you need receive updated survey data.

There’s a lot of upside to this approach — everything is built into Tableau and you’ll just need to refresh the extracts (there will be several of them) as you get new data.  The downside is the setup is a little bit cumbersome and Tableau manifests some “hmm, that doesn’t seem right” behavior along the way.

Before embarking on this I encourage you read this post so you understand what I mean when I refer to the data being “just so.”

Also, if you would like to follow along you can download the source data here.

Three files and three extracts

Anyone that has read my posts or attended my classes know that I want survey responses in both text and numeric formats. Sure, you can in fact manage with one format or the other, but you’re just creating a LOT more work for yourself if you don’t have the data in both formats.

You will also need what I call a “Helper” file — this is just a separate file that maps each question ID into human readable form and groups related questions together. Again, you can certainly get by without it but you’ll be working much harder than you need to, especially if you ever compare “Importance” with “Satisfaction” Likert-scale questions.

For this example we will create a separate extract for each of these three files, then use Tableau 10.x’s ability to join three different data sources.

Let’s start by creating the extracts.

To pivot the data labels and create an extract

  1. Start Tableau 10.x and indicate you want to connect to Excel.
  2. Connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx and drag the sheet Data Labels into the Drag sheets here area as shown below.
    01_draglabels
  3. Leave the first five columns intact (the Resp ID, demographic stuff, and Weight) and select all the other columns.
  4. Click any of the selected columns and select Pivot from the context menu.
    02_firstpivot
  5. Rename the first pivoted column Question ID and the second column Label as shown below.
    03_rename
  6. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  7. When asked to save, name the file DR_JustSo_Labels.tde (make sure to note where you are saving the file).

One down, two to go.

To pivot the data numbers and create an extract

  1. Click the New Data Source icon and indicate you want to connect to an Excel file.
  2. As before, connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx, but this time drag Data Numbers to the Drag sheet here area.
  3. Hide the columns labeled Gender, Location, Generation, and Weight — we already have them in the other data source and don’t need them twice.
  4. Leaving Resp ID in place, select the second through the last columns.
  5. Click the down arrow on any of the selected columns and select Pivot from the context menu.
  6. Rename the first pivoted column Question ID and the second column Value, as shown below.
    04_secondpivot
  7. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  8. When asked to save, name the file DR_JustSo_Numbers.tde.

Two down, one to go.

To create the question helper extract

  1. Click the New Data Source icon and indicate you want to connect to an Excel file.
  2. As before, connect to DR_SurveySampleData_SourceFiles_Fall2016.xlsx, but this time drag Question Helper to the Drag sheet here area.
  3. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  4. When asked to save, name the file DR_QuestionHelper.tde.

All three data sources are now ready.

Joining the three data sources together

We now have our three data sources as separate Tableau extract files. We’ll combine these three files (and create an extract from the joined files) using Tableau 10.x’s ability to join files from different data sources.

Note: This is where we’ll encounter Tableau’s “head-scratching” behavior.

To join the three data sources

Note: In first trying this Tableau presented a lot of warning messages about not being able to materialize a temporary table. While I could ignore these warnings and muddle through, you may not be so lucky. It turns out the culprit was my anti-virus software. I temporarily disabled it and everything worked without a hitch.  See http://kb.tableau.com/articles/issue/error-unable-to-materialize-temporary-table-joining-data-sources.

  1. Click the New Data Source icon and indicate you want to connect to More, as shown below.
    05_more
  2. Select DR_JustSo_Labels.tde and click Open.
  3. Click Add, as shown below.
    06_add
  4. Click More and then select DR_JustSo_Numbers.tde. Do not be fooled, the correct fields have NOT yet been joined.
  5. Click the overlapping Venn diagram to display the Join dialog box, as shown below.
    07_join1
  6. Click Number of Records and then click the X that appears in the row to indicate you do NOT want to join these two data sources using this field.
  7. From the left data source indicate you want to join using Resp ID, as shown below.
    08_join2
  8. From the right data source indicate that you want to join using RespID (Extract1). I have no idea why the field is named this way. More on this in a moment.
  9. From the left data source indicate that you want to join using Question ID. Yes, you need to join on more than one field.
  10. From the right data source indicate you want to join using Pivot Field Names (Extract1). Okay… THIS is the thing that has me scratching my head and as of this writing (November 27, 2016) I have no idea why the field isn’t also called Question ID.

    January 4, 2017 — Now I know why this is happening. When using data extracts (.TDE files) Tableau is only able to keep track of the alias names for fields in the first .TDE file.  All of that info gets stripped out from the second .TDE file and Tableau just sees the original field name (Pivot Field Names). A little off-putting, yes, easy to address and it won’t matter a lick once we’re building our visualizations.

  11. Click Add again, select More, and select DR_QuestionHelper.tde.
  12. Click the second Venn overlapping circle and remove any of the joins that may be in there (most likely again using Number of Records).
  13. From the left data source select Question ID and from the right data source select QuestionID as shown below.
    09_join3
  14. Hide the fields RespID (Extract1), both Number of Records fields, the second Question ID, and Pivot Field Names, and rename Pivot Field Values to Value. Your screen should look like this.
    10_joinall
  15. Indicate that you want to create an Extract (look for the radio button towards the upper right of the screen) and then click a sheet in your workbook to generate the Extract.
  16. When asked to save, name the file DR_JustSoAll.tde. After creating the extract Tableau will show all the field names grouped by data source, as shown below.
    11_datasource
  17. Click the down arrow and select Group by Folder.
    Okay, you don’t have to do this but I see no reason to group the fields by data source.

Congratulations, you now have your data “just so” and you did it all in Tableau.

So, how do you get the extracts to refresh?

Good question.  If you are using either Tableau Server or Online you can create shared data sources and program the extract to refresh on a regular or as-needed basis.

If you are using desktop — and you have all four data sources in in one workbook — you can just click the  Data menu and select Refresh All Extracts.

Conclusion

While I find the process outlined here both cumbersome and confusing (what is up with those renamed fields NOT staying renamed?) this approach does appear to work and you only need to set it up once. The same cannot be said of The Tableau add-in for Excel which requires a lot of manual intervention every time you want to update the data.

Will this replace Alteryx as my tool of choice? No, but it does work and you can’t beat the price.

 Posted by on November 28, 2016 2) Visualizing Survey Data, Blog Tagged with: , , , ,  48 Responses »
Oct 132015
 

Overview

You can download a trial version of Alteryx here.

You can find the Excel source data here.

You can download the completed Alteryx module from here.

Understanding the workflow

Here’s the Alteryx workflow that will get the data in the format we want.

Figure 9 -- Alteryx workflow for getting the survey data in an optimal format for analysis in Tableau

Figure 9 — Alteryx workflow for getting the survey data in an optimal format for analysis in Tableau

If this is the first time you’ve seen an Alteryx workflow it may look a little complicated (shades of Rube Goldberg) but think about how complicated the flow and decision tree is for what you do when you get up in the morning and get ready for work – it’s way more complicated.

Let’s start by focusing on the three data sources that we need (labeled 1, 2, and 3 in the diagram above).

  1. Data with numeric responses (we’ll use this for our demographic data, too)
  2. Data with text responses
  3. The Question Helper file (contains our meta data).

A note about SPSS files

For this example we’re using Excel data but the process would be identical for survey data saved in an SPSS format (.SAV file).  SPSS encodes both the text and numeric data in a single file, but you need to input values from that same data source twice – once for text and once for numeric values.

When you specify an SPSS file as a data source you’ll see the following check box option.

Figure 10 -- Options that appear when you bring in data from an SPSS file.

Figure 10 — Options that appear when you bring in data from an SPSS file.

Selecting “Expand Factors” will bring in the data as labels.  Unchecking this box will bring in the data as numbers.

Demographics

Figure 11 -- Selecting the demographic components from the data source

Figure 11 — Selecting the demographic components from the data source

Clicking the Select tool reveals the following settings.

Figure 12 -- Specifying which fields we want to keep

Figure 12 — Specifying which fields we want to keep

A translation into English would be “open this Excel file and select these fields”.  Here we keep RespID, Gender, Location, Generation, and Weight.

Numeric values

Figure 13 -- Numeric values section of the Alteryx workflow with the Reshape tool selected

Figure 13 — Numeric values section of the Alteryx workflow with the Transpose tool selected

Using the same data source, selecting the Transpose tool reveals the following settings.

Figure 14 – Transpose tool settings

Figure 14 – Transpose tool settings

Here we indicate that we want to keep RespID and pivot all the other checked fields.  Notice that the demographic components are not checked.

Looking back to the workflow diagram, we next follow this pivot with a Select tool which we use to rename the pivoted fields as shown here.

Figure 15 -- The Select tool allows us to name the pivoted fields

Figure 15 — The Select tool allows us to name the pivoted fields

Here we are naming the first pivoted field Question ID and the second pivoted field Numeric value.

Dealing with nulls

Figure 16 -- The Filter tool allows us to eliminate null values before they hit Tableau.

Figure 16 — The Filter tool allows us to eliminate null values before they hit Tableau.

While not critical, Alteryx allows us to eliminate null values before they come into Tableau.  Here’s the setting for this tool.

Figure 17 -- Setting to eliminate nulls from the Numeric value field

Figure 17 — Setting to eliminate nulls from the Numeric value field

Joining the demographic with the transposed (pivoted) numeric data

Figure 18 -- Joining the demographic and transposed numeric data

Figure 18 — Joining the demographic and transposed numeric data

We’re now ready to join the demographic data with the pivoted numeric survey data. We do this with Alteryx’s Join tool.  Here are the settings.

Figure 19 -- Joining demographics and numeric data using the common field, RespID. Notice that we don't include the redundant version of RespID that is coming from the right table.

Figure 19 — Joining demographics and numeric data using the common field, RespID. Notice that we don’t include the redundant version of RespID that is coming from the right table.

Checking our progress

If we add a Browse tool after the Join we’ll see the following results.

Figure 20 -- Results of the initial Join

Figure 20 — Results of the initial Join

Note that we could, in fact, get by with using just this in Tableau, but the whole point is to make things faster and easier, so we’re going to add text responses and the “Helper” data.

Text Values

Figure 21 -- Workflow for setting the text values

Figure 21 — Workflow for setting the text values

This is identical to the numeric values except that we specify a different data source (one with label responses instead of numeric) and we rename the second pivoted field differently, as shown here.

Figure 22 -- The Select tool allows us to name the pivoted fields

Figure 22 — The Select tool allows us to name the pivoted fields

Here we are naming the first pivoted field Question ID and the second Text value.

Tying Demographic, Numeric and Text Values Together

We need to combine the demographic / numeric values with the text values.  We will again use Alteryx’s Join tool, but we’ll need to join on both RespID and Question ID.

Figure 23 -- Joining demographic, numeric, and text data together

Figure 23 — Joining demographic, numeric, and text data together

The setting for the Join is shown below.

Figure 24 -- Join settings for combining demographics / numeric with text data

Figure 24 — Join settings for combining demographics / numeric with text data

Adding the question meta data and outputting the results

We’re almost done; we just need to add the information that tells us about each Question ID so we can map the ID to its human-readable form, group related IDs together, and so on.

Figure 25 -- Joining the "Helper" file data

Figure 25 — Joining the “Helper” file data

The setting for this Join is shown below.

Figure 26 -- Marrying the question meta data to the demographic and survey data

Figure 26 — Marrying the question meta data to the demographic and survey data

The only thing left to do is output this to a new data source.  In the sample workflow I output to an Excel file but you can just as easily write directly to a Tableau Data Extract file.

But Wait!  There’s More!

I intentionally used data that was well behaved, but what happens if the data is “messy”?  For example, suppose there’s an extra row below the field names?  Or suppose that for a check-all-that-apply question, instead of 1s and 0s we instead have 1s and blanks (which would mess things up in Tableau)?  Alteryx handles both of these scenarios (and dozens more) easily.

Is there a Downside?

If there’s any downside it would be the cost as Alteryx starts at around $4,000 per year.  If all you are doing is some occasional survey data reshaping then this fee would probably be prohibitively expensive. If, however, you need to corral a lot of data – or if you need advanced data blending, geospatial analysis, or predictive modeling – then Alteryx is a huge bargain.

 Posted by on October 13, 2015 2) Visualizing Survey Data, Blog Tagged with: , , , , ,  8 Responses »
Oct 132015
 

Overview

In writing about visualizing survey data using Tableau I’ve found that the number one impediment to success is getting the data in the right format. In accompanying posts I’ll explain how to get this done using Alteryx, Tableau 10.x, the Tableau Excel add-in, and Tableau 9.0 pivot feature (you can come close with 9.x, but can’t get it perfect).

What do I mean by “just so”?

When I deal with survey data there are usually four different elements that need to fit together:

  1. The demographic information (e.g., age of respondents, gender, etc.)
  2. Survey responses in text format
  3. Survey responses in numeric format
  4. Meta data that describes the survey data.

Let’s see what the four elements look like using an Excel sample data set (click here to download).

Demographic data

Here’s what the demographic data looks like.

Figure 1 -- Demographic data

Figure 1 — Demographic data

Survey responses in text format

Here are several columns of survey responses in text format.  Column F contains data for a Yes / No / Don’t know question.  Column G contains responses for a question about salary.  Columns H through P are responses for check-all-that apply questions and columns Q and R contain Likert scale responses.

Figure 2 -- Survey responses in text format

Figure 2 — Survey responses in text format

Survey responses in numeric format

Here are the same responses but in numeric format.

Figure 3 -- Survey responses in numeric format

Figure 3 — Survey responses in numeric format

I’ll explain why it’s so useful to have the survey responses in both text and numeric format in a bit.

Meta Data (the “helper” file)

Here’s some data that I usually prepare by hand as most survey tools won’t produce it for me automatically.  Having this helps me understand the data and will  greatly streamline my work in Tableau.

Figure 4 -- Survey data meta data. This doesn’t take long to create and will be a huge time saver once we get the data into Tableau.

Figure 4 — Survey data meta data. This doesn’t take long to create and will be a huge time saver once we get the data into Tableau.

What does “just so” look like?

Our goal is to combine and reshape the various elements so that they look like this.

Figure 5 -- Reshaped data joined with meta data. Survey data in this format is very easy to use with Tableau.

Figure 5 — Reshaped data joined with meta data. Survey data in this format is very easy to use with Tableau.

As I’ve written previously, the key thing is that I no longer have a separate column for each survey response.  Indeed, I’ve reduced the number of columns from 45 to just 11, but I’ve also increased the number of rows from 845 to over 25,000. That is a good thing.

Why this works so well with Tableau

Our goal is to see how to get Alteryx to get the data in this format, not to actually use the data, but if you need convincing on why the meta data is so helpful, consider the following example.

Let’s say that in your survey you ask people to indicate the importance and satisfaction about certain services, as shown here.

Figure 6 -- Question comparing importance with satisfaction

Figure 6 — Question comparing importance with satisfaction

With the data set up “just so” conducting this comparison in Tableau becomes easy.  First we can drag Question Grouping into Filters and indicate that we just want to look at Importance and Satisfaction questions.

Figure 7 -- Using the Question Grouping field to just focus on Importance and Satisfaction questions

Figure 7 — Using the Question Grouping field to just focus on Importance and Satisfaction questions

Then we can drag Wording and Question Grouping onto the Rows shelf which gives us the framework for comparing importance and satisfaction across ten different questions.  No more having to “look up” which questions we want to explore and no more having to alias question IDs.  I love this!

Figure 8 – The helper file meta data provides the framework for comparing questions and building visualizations.

Figure 8 – The helper file meta data provides the framework for comparing questions and building visualizations.

Why do we need both text and numeric results?

We don’t really need them, but I know I certainly want them.

Consider all of the Likert scale question results.  The universe of possible values are

1
2
3
4
5

Suppose we want to know just what each of the values (1, 2, 3, 4 and 5) stand for?  The problem is that it depends on the question being asked as sometimes a 5 means “Strongly agree”, for other questions it  means “Critical” and for others it means “Extremely satisfied”.

Without having both numeric and text results we will have to write A LOT of IF / CASE statements and I, for one, do not want to do that.

So, now that we understand how and why we want the data “just so” we’ll see how to get it that way using Alteryx, Tableau 10.x, the Tableau-add-in for Excel, and Tableau 9.x.

 Posted by on October 13, 2015 2) Visualizing Survey Data, Blog Tagged with: , , , ,  5 Responses »
Mar 302015
 

Overview

Tableau 9.0 includes a built-in data prepping tool that makes reshaping survey data so it plays nicely with Tableau a much smoother experience than using the Tableau Excel add-in.  While this new feature won’t replace by trusty copy of Alteryx (for reasons that I explain later in this post) there are many occasions where Tableau’s new pivot feature will be more than adequate.

In this post I will walk through using the new pivot feature along with “temporary” blending to create a solid framework for using survey data with Tableau.

Special thanks to Susan Baier for bringing this to my attention and Jonathan Drummey for showing me Tableau’s Create Primary Group feature.

So, what do we have here?

Note: if you want to follow along you can download the Excel file here.

Consider an Excel workbook that contains two sheets.  The first sheet has the survey results, a snippet of which is shown here.

Figure 1 -- Some raw survey data

Figure 1 — Some raw survey data

Notice the format: one row for each survey respondent and a separate column for each question in the survey where each question is identified with a Question ID (e.g.,  Q0, Q1, Q2, Q134a, etc.).

Column A contains a unique ID for each survey taker, Columns B through D contain demographic information, and Column E contains a weight for each survey respondent.

The second sheet maps each Question ID to a human-readable version of the question and groups related questions into logical buckets.

Figure 2 -- Helper file that maps each Question ID to the wording of the question from the survey

Figure 2 — Helper file that maps each Question ID to the wording of the question from the survey

Note that when I first blogged about survey stuff I didn’t use a helper file but now I won’t take on a project without creating one as I don’t want to spend time aliasing hundreds of question IDs.  The Grouping column also makes is much easier to select related questions and visualize them together.

The data wants to be tall and thin

Anyone who has read up on the subject know that life with survey data and Tableau is a lot easier when the data is reshaped so let’s see how to do this with Tableau 9.0.

  1. In Tableau, connect to the data source and the sheet that contains the data you want to reshape and visualize. This is what it looks like on my screen.

    Figure 3 -- Survey data prior to pivoting (reshaping)

    Figure 3 — Survey data prior to pivoting (reshaping)

  2. Select the fields you want to merge / pivot / reshape, in this case everything except the Resp ID, demographic fields, and Weight field.
  3. Click in any of the highlighted fields and select Pivot. Tableau will combine the 20+ fields into two fields, as shown here.

    Figure 4 -- Data after it has been pivoted

    Figure 4 — Data after it has been pivoted

  4. Rename the first field Question ID and the second field Value.

    Figure 5 – Pivoted fields renamed

    Figure 5 – Pivoted fields renamed

  5. Indicate whether you want an extract (a good idea when Excel is the data source) and go to the Tableau worksheet.
  6. Drag Question ID onto the rows shelf. Your screen should look like this.

    Figure 6 -- Reshaped data in Tableau.  Instead of 20 measures for each question we have only one measure.

    Figure 6 — Reshaped data in Tableau. Instead of 20 measures for each question we have only one measure.

Creating the temporary blend

Now we need to connect and relate the Helper file to our pivoted survey data.  We will do this with a blend, but then use a very slick feature of data blending that will allow us to ditch the secondary data source. Here are the steps.

  1. Click the Add New Data source tool.8_dataSource
  2. Connect to the Helper File sheet from the same Excel workbook and indicate whether or not you want to create an Extract (of course you do!)

    Figure 7 -- The secondary data source

    Figure 7 — The secondary data source

  3. Return to the Tableau worksheet.
  4. Drag the Grouping field to the left of Question ID on the rows shelf, and Wording to the right, as shown below.  Note that you don’t *have* to do this but it’s always useful to see if the hierarchy is working correctly.

    Figure 8 -- Blended Data

    Figure 8 — Blended Data

  5. Right-click the Grouping pill on the Rows shelf and select Create Primary Group.
  6. Rename the group Grouping as shown below.

    Figure 9 -- Leveraging the blend to create an ad-hoc group based on Question ID fields.

    Figure 9 — Leveraging the blend to create an ad-hoc group based on Question ID fields.

  7. Click OK.
  8. Right-click the Wording pill on the Rows shelf and select Create Primary Group.
  9. Rename the group Wording and click OK.
  10. Click the primary data source (the one from which we initially selected Question ID). Notice the two groups that Tableau generated for us.

    Figure 10 -- Tableau-generated groups

    Figure 10 — Tableau-generated groups

At this point we no longer need the secondary data source as the primary source now has groups that map and alias the Question IDs.  Very slick.

Seeing this in action

Now that we have the groups it’s easy for us to do some very quick analysis.  For example, let’s suppose we want to see the average Likert scale score for the collection of Likert scale questions.

  1. Create a new worksheet.
  2. Drag Grouping into the Filters shelf and select the collection of questions you want to view, in this case Likert Set 1.

    Figure 11 -- The Grouping group makes is easy to indicate which sets of related questions you want to examine.

    Figure 11 — The Grouping group makes is easy to indicate which sets of related questions you want to examine.

  3. Drag Wording to Rows.
  4. Right-Drag Value to Columns and select AVG(Value).
  5. Sort in descending order.

Isn’t this great?  We didn’t have to go groping around for the right Question IDs and we didn’t have to alias anything.

So, are there any shortcomings?  Is this blend approach as good as being able to join the pivoted data with the helper file?

Yes, there are shortcomings

There are several things that a join will give us that we can’t get with a blend.

You cannot refer to the group in a calculated field

You can’t refer to a group in a calculated field, so something like this won’t be available:

IF [Grouping] =”Things you Measure” then [Value] END

You need to update the group members if you add new questions

Tableau’s generation of the primary group is much like populating the members of a parameter with the members of a field.  Tableau will do it when you click a button, as it were, but it won’t update the list automatically.

If you end up adding new questions to your survey or reorganizing how questions are categorized in your helper file will either need to regenerate the primary data source groups or manually edit them.

You cannot combine text results with numeric results

This is one of the major “gotchas” for me, at least for larger surveys.  With most commercial survey systems you can download the data in a label format or a numeric format.  For example, when downloaded as labels survey responses might look like this:

Strongly disagree
Disagree
Neutral
Agree
Strongly agree

When downloaded as numbers the same responses would look like this:

1
2
3
4
5

I find I like to have both label and numeric responses, so I pivot / reshape both sets of data and then join them together using Question ID and Response ID.  Using Alteryx I can perform the join but I cannot do it with Tableau 9 and pivoted data.

Conclusion

For complex surveys where I need to do a fair amount of data cleanup and need both next and numeric values I’ll continue to use Alteryx.  For shorter surveys where I don’t need to do a lot of prep work and where either labels or numeric values will suffice, Tableau 9.0’s new pivot feature suits me just fine.  It’s a great addition to a great product.

 Posted by on March 30, 2015 2) Visualizing Survey Data, Blog Tagged with: , , ,  29 Responses »