May 2, 2018

Don’t have Tableau Prep? You can download it here.

Want the packaged Tableau Prep Flow explored in this post? Click here to download it.

Overview

As anyone who has read anything on my blog that relates to survey data knows, the number one impediment to success with Tableau is getting your survey data “just so.”

In this post I’ll show you how to take the same data set I use in all my classes and transform it using Tableau Prep so that it works perfectly with Tableau.

Working with Tableau Prep

Using data similar to what I discuss in the “here’s how your data needs to be setup” blog post, let’s see how we’re going to get the data-as-numbers, the data-as-labels, and the helper file to all play nicely in Tableau Prep.

Importing and Reshaping the Label Responses

Here’s the Tableau Prep home page.

Figure 1 — Tableau Prep home screen

Let’s begin by importing the Label responses from Excel.

We’ll start by clicking Connect to Data so we can see all the sources to which we can connect.

Figure 2 — Tableau Prep data sources.

In this example we’ll choose Microsoft Excel and select DataRevelations_SurveyData_V3.xlsx.

Tableau Prep recognizes that there are three tabs in this file. We’ll start by dragging Data Labels onto the canvas.

Figure 3 — Dragging the first data source into the Tableau Prep canvas

Inspecting the Merchandise

Let’s see what we have here by clicking the Plus sign next to our newly imported data source and selecting Add Step.

Figure 4 — Clicking Add Step will display a profile of our data. We can also filter data and create calculated fields within a step.

This will change the view so that we can see both the data profile and the  data grid, as shown below.

Figure 5 — the source data, as a data grid (bottom) and with response profiles (center).

Woah!  Look at the stuff in the middle! You can see a histogram of all the responses to each question, before you even analyze things in Tableau!  For example, we can se that more men than women took this survey (1) and that there were a handful of people that did not specify where they live (2).

Tableau Prep also allows for some interactive exploration. For example, if instead of showing bucketed values for RespID we instead indicate we want to see details…

Figure 6 — Changing from Summary to Detail view.

We can select individual items in the profile view and see the responses for those selections, as shown below.

Figure 7 — Selecting items in the profile view (1) shows associated records in the data grid (2).

Pivoting the Data

We’re now ready to leave the “demographic” columns (Q0_Gender, Q0_Location, etc.) intact and “pivot” the question columns. I place the word “pivot” in quotes because many other tools refer to the action of taking columns and turning them into rows as “unpivoting”.  No matter; we are going to “reshape” the data.

We do this by clicking the Plus sign next to the “Clean 1” step and selecting Add Pivot.

Figure 8 — How we indicate we want to reshape the data.

Now we need to specify which fields we want to reshape and which we want to leave intact. Below we drag all the fields except the demographic fields and RespID into the Pivot Values area.

Figure 9 — Dragging the fields we want to pivot into the Values area.  Note that we could also drag all the fields and them remove the few fields we don’t want to pivot.

Renaming the Pivoted Fields

The only thing left to do is to rename the reshaped fields (Pivot1 Name and Pivot1 Values). We can do that by using the ever-so-Tableau approach of right-clicking the field name and selecting Rename Field.

Figure 10 — Renaming the pivoted fields.

Here we’ll change the first field to Question ID and the second to Text Responses.

Figure 11 — Pivoted and renamed fields.

At this point if we want to inspect the results we can click the Plus sign and selecting Add a Step, the results of which are shown below.

Figure 12 — Inspecting our results.  Note that if you want to rearrange the order of the fields you can do so by dragging and dropping them.  I prefer to see RespID all the way to the left.

Importing and Reshaping the Numeric Responses

We’re now ready to import numeric-encoded survey responses.

We’ll start by dragging the Data Numbers source into the canvas.

Figure 13 — Importing the survey responses encoded as numbers.

It turns out there are some fields we won’t need as already have all the demographics fields we need from the Data Labels source.  Below we deselect the fields. Note that Q0_Weight can only have a numeric value and is encoded as such in the Data Labels data source.

Figure 14 — Deselecting the fields we don’t need to have a second time.

Pivoting the Data

We’re now ready to leave RespID intact and “pivot” the question columns

We do this by clicking the Plus sign in the “Clean 1” step and selecting Add Pivot.

Figure 15 — Reshaping the numeric data

As with the Data Label responses, we now need to specify which fields we want to reshape and which we want to leave intact. Below we drag all the fields except the RespID into the Pivot Values area.

Figure 16 — Dragging the fields we want to pivot into the Values area.

Renaming the Pivoted Fields

As before, the only thing left to do is to rename the reshaped fields (Pivot1 Name and Pivot1 Values). We can do that by right-clicking the field name and selecting Rename Field.

 

Figure 17 — Renaming the pivoted fields.

Here we’ll change the first field to Question ID and the second to Numeric Responses.

Figure 18 — Pivoted and renamed fields.

Joining the Text and Numeric Responses

Now that we have both the text and numeric results pivoted / reshaped we need to merge the data so that all the numeric responses line up with all the text responses.  That is, for every Resp ID we want to make sure the text and numeric responses for each Question ID line up properly.

We’ll do this in Tableau Prep by joining the two reshaped data sources.

Figure 19 — You create a join by dragging the second source up and to the right of the first source and making sure to drop it within the “New Join” box.

Tableau Prep will see that there’s a match on RespID and create a join using just that field.

Figure 20 — Initial Join results. This will create over one million rows of data because it generates every possible combination of text and numeric result for each RespID.

We need to indicate that we also want to join using the Question ID field. We do that by clicking the plus and specifying the additional field, as shown below.

Figure 21 — Indicating that in addition to RespID we also want to join using Question ID.

If we inspect our results by adding a new step we see that everything lines up perfectly, but we also have redundant RespID and Question ID fields.

Figure 22 — Notice how the numeric and text responses for Q28_IMP match. Also notice that we have some redundant fields.

We can take care of the redundant fields by right-clicking and selecting Remove fields.

Renaming a Step and Seeing Changes

You can change the wording below each of the steps in the Tableau Prep flow to convey something a little more descriptive than “Pivot 1” and “Clean 2.” For example, we can right-click Clean 3 and rename it “Remove fields” as shown below.

Figure 23 — Renaming a step

You may also be wondering how you can see those changes should you want to edit the flow.  Notice below that there is a strip along the left of the profile pane that contains the text “Changes (2)”.

Figure 24 — Tableau prep indicates whether there are any changes for the selected step.

Clicking the “>” icon will expand the changes pane so you review and edit changes.

Figure 25 — Expanding the Change pane shows edits made to the flow in that step.

Adding the Question Helper Metadata

The only thing remaining is to merge the Question Helper data into the flow. We do this by dragging the Question Helper onto the canvas and joining this data source with the Remove fields step as shown below.

Figure 26 — Merging the Question Helper file.

Note that even though the field names are not exactly the same (“Question ID” vs “QuestionID”) Tableau Prep infers that these are the fields to join, as shown here.

Figure 27 — Tableau Prep guesses correctly on which field names to join.

Let’s see the results by adding a new step.

Figure 28 — Our survey data, “just so.”

Note that at this point the data has morphed from containing 45 columns and 845 rows into 12 columns and over 34,000 rows (1).

Note also that by selecting Q2_3 in the profile pane we can see the results for just that question in the data grid (2).

Finally, we may want to remove the redundant QuestionID field (3).

Removing the Null Values

Assuming our survey tool encodes check-all-that-apply questions correctly (coding them as 1s and 0s instead of 1s and blanks) we can reduce the size of our data set by removing Null values.  Note that this should not be a cavalier assumption as many survey tools do not do this correctly (e.g., Survey Monkey and Qualtrics). Make sure to check out the companion blog post Using Tableau Prep to fix problems with Check-All-That-Apply questions.

So, assuming it’s safe to do so, here’s how to remove the Nulls.

In the Profile pane, right-click the null bar and select Exclude from the context menu.

Figure 29 — Removing Nulls.  You can expand the Changes pane to see what changes have been applied.

Exporting the Results

Now that everything is “just so” we can export our results to either a CSV, TDE, or Hyper format.

Here we’ll create a Hyper data extract by clicking the Plus sign next to the Clean 3 step and select Add Output.

Figure 30 — Output the results to Hyper.

Below we specify that we want to save the results to a file (rather than publishing to a server).

Figure 31 — Specifying file name, location, and type.

The last step is to click Run Flow which will run our process and output the results to a Hyper file. We’re now ready to use our “just so” data source with Tableau.

Conclusion

Tableau Prep was very easy to learn, and you can’t beat the price (free with your Tableau subscription or part of your current maintenance). The Profile Pane also provides insights to your data that I’ve not seen in any other ETL tool, and Prep’s ability to handle badly coded survey data is elegant and straightforward (see related blog post.)

Tableau Prep also allows you to save your work as a packaged workflow so that you can share the flow and the source data in a single file.

The first release certainly has its downsides; e.g., you cannot connect to SPSS files and you cannot automate when to run an update, and you cannot export to anything except CSV and Tableau extract formats. But even with these shortcomings Tableau Prep is now my tool of choice for getting survey data “just so.”

Don’t have Tableau Prep? You can download it here.

Want the packaged Tableau Prep Flow explored in this post? Click here to download it.