### Overview

I received an e-mail inquiry about weighted data recently and realized that while I cover this in my survey data class I had not yet posted anything about it here. Time to remedy that.

The good news is that it is not at all difficult to work with weighted survey data in Tableau. And just what do I mean by weighted data? We use weighting to adjust the results of a study so that the results better reflect what is known about the population. For example, if the subscribers to your magazine are 60% female but the people that take your survey are only 45% female you should weigh the responses from females more heavily than males.

To do this each survey respondent should have a weighting amount associated with their respondent ID, as shown here.

When pivoting / reshaping the data make sure that [Weight] does not get reshaped. It should remain in its own column like the other demographic data.

Once this is in place we’ll need to modify the formulas for the following questions types:

- Yes / No / Maybe (single punch)
- Check-all-that-apply (multi-punch)
- Sentiment / Likert Scale (simple stacked bar)
- Sentiment / Likert Scale (divergent stacked bar)

### Yes / No / Maybe (single punch)

With this type of question you usually want to determine the percentage of the total.

#### Unweighted calculation

The table calculation to determine the percentage of total using *unweighted* data is

SUM([Number of Records]) / TOTAL(SUM([Number of Records]))

#### Weighted calculation

The table calculation to determine the percentage of total using *weighted* data is

SUM([Weight]) / TOTAL(SUM([Weight]))

### Check-all-that-apply (multi punch)

With this type of question you usually want to determine the percentage of people that selected an item. The total will almost always add up to more than 100% as you are allowing people to select multiple items.

Most surveys will code the items that are checked with a “1” and those that are not checked with a “0”.

#### Unweighted calculation

The calculation to determine the percentage of people selecting an item using *unweighted* data is

SUM([Value]) / SUM([Number of Records])

where [Value] is the name of the measure that contains the survey responses. If the survey responses are coded as labels instead of numbers you can use this formula instead.

SUM(IF [Label]="Yes" then 1 ELSE 0 END) / SUM([Number of Records])

#### Weighted calculation

The calculation to determine the percentage of people selecting an item using *weighted* data is

SUM(IF [Value]=1 then [Weight] ELSE 0 END) / SUM([Weight])

### Sentiment / Likert Scale (simple stacked bar)

This is very similar to the single-punch question but instead we have several questions and compare them using a stacked bar chart. I am not a big fan of this approach but it can be useful when you superimpose some type of score (e.g., average Likert value, percent top 2 boxes, etc.).

#### Unweighted calculation – Stacked Bar

The table calculation to determine the percentage of total using *unweighted* data is

SUM([Number of Records]) / TOTAL(SUM([Number of Records]))

#### Weighted calculation – Stacked Bar

The table calculation to determine the percentage of total using *weighted* data is

SUM([Weight]) / TOTAL(SUM([Weight]))

#### Unweighted calculation – Percent Top 2 Boxes

Assuming a 1 through 5 Likert scale, the calculation to determine the percentage of people selecting either Very high degree or High Degree (top 2 boxes) using *unweighted* data is

SUM(IF [Value]>=4 then 1 ELSE 0) / SUM([Number of Records])

#### Weighted calculation – Percent Top 2 Boxes

Assuming a 1 through 5 Likert scale, The calculation to determine the percentage of people selecting either Very high degree or High Degree (top 2 boxes) using *weighted* data is

SUM(IF [Value]>=4 then [Weight] ELSE 0) / SUM([Weight])

### Sentiment / Likert Scale (divergent stacked bar)

Here is what I believe is a preferable way to show how sentiment skews across different questions.

I’ve covered how to build this type of chart using unweighted values here.

There are six fields we need to fashion the visualization, three of which need to be modified to make the visualization work with weighted data.

**Count Negative**- Gantt Percent
- Gantt Start
**Percentage****Total Count**- Total Count Negative

#### Count Negative – Unweighted

Assuming a 1 – 5 Likert scale, the calculation to determine the number of negative sentiment responses using *unweighted* data is

IF [Score]<3 THEN 1 ELSEIF [Score]=3 THEN .5 ELSE 0 END

#### Count Negative – Weighted

Assuming a 1 – 5 Likert scale, the calculation to determine the number of negative sentiment responses using *weighted* data is

IF [Score]<3 THEN [Weight] ELSEIF [Score]=3 THEN .5 * [Weight] ELSE 0 END

#### Percentage – Unweighted

The calculation that determines both the size of the Gantt bar and the label for the bar using *unweighted* data is

SUM([Number of Records])/[Total Count]

#### Percentage – Weighted

The calculation that determines both the size of the Gantt bar and the label for the bar using *weighted* data is

SUM([Weight])/[Total Count]

#### Total Count – Unweighted

The calculation that determines the total number of responses for a particular question for *unweighted* data is

TOTAL(SUM([Number of Records]))

#### Total Count – Weighted

The calculation that determines the total number of responses for a particular question for *weighted* data is

TOTAL(SUM([Weight]))

### Summary

Here’s a summary of all the unweighted calculations and their weighted equivalents

Unweighted |
Weighted |

SUM([Number of Records]) / TOTAL(SUM([Number of Records])) | SUM([Weight]) / TOTAL(SUM([Weight])) |

SUM([Value]) / SUM([Number of Records]) | SUM(IF [Value]=1 then [Weight] ELSE 0 END) / SUM([Weight]) |

SUM(IF [Value]>=4 then 1 ELSE 0) / SUM([Number of Records]) | SUM(IF [Value]>=4 then [Weight] ELSE 0) / SUM([Weight]) |

IF [Score]<3 THEN 1 ELSEIF [Score]=3 THEN .5 ELSE 0 END | IF [Score]<3 THEN [Weight] ELSEIF [Score]=3 THEN .5 * [Weight] ELSE 0 END |

SUM([Number of Records])/[Total Count] | SUM([Weight])/[Total Count] |

TOTAL(SUM([Number of Records])) | TOTAL(SUM([Weight])) |

Jeff WNovember 25, 2015 at 7:43 amHi Steve – thanks for posting. Weighted survey data adds another layer of complexity, and I’ve had trouble with it before.

But doesn’t using SUM([Weights]) as the denominator in the percent of total calculations mean we are returning percent of responSES, not percent of respondENTS? In other words, you can have zero, one, or many responses per respondent in a “select all” question, meaning the SUM([Weights]) will be different for each “select all” question and will not be equal to the weighted total number of respondents. Don’t we need to use a sort of “sum distinct” function to only count a respondent in the denominator once? I haven’t figured out how to do that though (Tableau has a Count Distinct function, but no SUM Distinct).

swexlerNovember 25, 2015 at 9:27 amJeff,

SUM([Weights]) works for the same reason that SUM([Number of Records]) and SUM(1) works for unweighted data. It turns out that you do NOT have find the equivalent of COUNTD([ID]). Because we are only seeing responses for each question and not for the overall survey we don’t need to determine the distinct number of respondents.

You can placate your concern by placing the questions on rows and COUNT(ID), COUNTD(ID), and SUM([Number of Records]) on columns. You’ll get the same results for all three pills.

When you get a chance go to https://www.datarevelations.com/visualizing-survey-data and download the sample packaged workbook. It walks through all the different examples.

The only wrinkle in the whole thing is getting weighted responses for the demographic questions. Here we need to filter the results by whatever question received a response from all survey takers.

Steve

Wuffy BeansNovember 25, 2015 at 3:25 pmit would be great if you append dashboards

swexlerNovember 25, 2015 at 3:59 pmI do have an updated packaged workbook that has both unweighted and weighted examples. You can find it at https://www.datarevelations.com/visualizing-survey-data.

Steve

MattJuly 12, 2016 at 5:28 amHi Steve,

Thanks a lot for the info! Could you share how to create the weight variable in Tableau please? Can it be calculated within the system automatically?

All the best,

Matt

swexlerJuly 12, 2016 at 9:23 amMatt,

Great question and I do not yet have an answer for you.

My client typically use SPSS to weigh the data and then we either export that data to .CSV files or import it into Alteryx. I’ve no doubt there must be a way to avoid the SPSS step and come up with a way, either inside Tableau or certainly inside Alteryx, to do this.

Tackling this is on my to-do list; just haven’t gotten to it yet.

Steve

Mike JewerMarch 6, 2017 at 9:44 amHi Steve,

I really enjoyed your working with survey data class and have gotten great help from your blog posts/website. One thing I have yet to find or figure out on my own is how to handle mean scores with weighted data.

I used LikertValue: Float(Left([Label],2)) for unweighted mean scores but can’t figure out how to manipulate that for weighted data.

Any tips would be greatly appreciated.

Thanks,

Mike

swexlerMarch 6, 2017 at 10:11 amMike,

I gather your FLOAT(LEFT([Label,2)) is giving you a number (probably between 1 and 5) and then you are taking the average of that, as in

AVG(FLOAT(LEFT([Label,2)))

Assuming that each respondent has a weight value, (let’s call it [WEIGHT]) I think this would work:

AVG(

FLOAT(LEFT([Label,2)) * [WEIGHT]

)

Steve

Mike JewerMarch 6, 2017 at 11:23 amThanks Steve for the quick reply.

I tried your suggestion but using AVG( FLOAT(LEFT([Label,2)) * [WEIGHT]) didn’t work.

I created a new measure “LikertValue (weighted)” with the following code:

Float(Left([Label],2))*[Weight Alt]

and in the shelf chose this: AVG([LikertValue (weighted)])

Some of the mean scores are outside the range of values so something odd is happening.

swexlerMarch 6, 2017 at 11:57 amTry this:

Weight at the respondent level: [Weight Alt]

Likert Response: [Value]

SUM([Value]*[Weight Alt]) / SUM([Weight Alt])

Yes, I know you have to do the string to float conversion as you don’t have straight-ahead values in your data set. You *should* be able to swap out [Value] with Float(Left([Label],2)) and get what you need.

SUM(Float(Left([Label],2))*[Weight Alt]) / SUM([Weight Alt])

Steve

Mike JewerMarch 6, 2017 at 2:55 pmThis worked! Thank you so very much.

SurajApril 7, 2017 at 9:58 amHi Swexler,

I am trying to calculate weighted score for a survey data for each question and further aggregate them an different demographics like question group, location, gender etc.

There are 2 type of questions, positively framed and negatively framed. Each question has 5 responses as value.

Neutral

Somewhat agree

Somewhat disagree

Strongly agree

Strongly disagree

I tried to calculate the % count of responses as

COUNT([Value]) / TOTAL(COUNT([Value]))

and again, I am trying below calculation for weighted average as per the calculation below

IF ATTR([Qframe])=”Positively Framed Question” and ATTR([Value])=”Neutral” then ([%Count]*50)

ELSEIF

ATTR([Qframe])=”Positively Framed Question” and ATTR([Value])=”Somewhat agree” then ([%Count]*75)

ELSEIF

ATTR([Qframe])=”Positively Framed Question” and ATTR([Value])=”Somewhat disagree” then ([%Count]*25)

ELSEIF

ATTR([Qframe])=”Positively Framed Question” and ATTR([Value])=”Strongly disagree” then ([%Count]*0)

ELSEIF

ATTR([Qframe])=”Positively Framed Question” and ATTR([Value])=”Strongly agree” then ([%Count]*100)

ELSEIF

ATTR([Qframe])=”Negatively Framed Question” and ATTR([Value])=”Neutral” then ([%Count]*50)

ELSEIF

ATTR([Qframe])=”Negatively Framed Question” and ATTR([Value])=”Somewhat agree” then ([%Count]*25)

ELSEIF

ATTR([Qframe])=”Negatively Framed Question” and ATTR([Value])=”Somewhat disagree” then ([%Count]*75)

ELSEIF

ATTR([Qframe])=”Negatively Framed Question” and ATTR([Value])=”Strongly disagree” then ([%Count]*100)

ELSEIF

ATTR([Qframe])=”Negatively Framed Question” and ATTR([Value])=”Strongly agree” then ([%Count]*0)

ELSE 0

END

Above calculation works fine on Question level but I am getting wrong values for sub totals at question group or if any filter applies.

Also, I tried to assign values for the positively framed questions as

Neutral 3

Somewhat agree 2

Somewhat disagree 4

Strongly agree 1

Strongly disagree 5

and for negatively framed questions as

Neutral 3

Somewhat agree 4

Somewhat disagree 2

Strongly agree 5

Strongly disagree 1

so that there wont be any need for identifier for sentiments.

still no luck.

I can create another column in back end data if require.

Please help I’m really stuck.

swexlerApril 7, 2017 at 10:13 amSuraj,

I am not sure why the same workbook that I make available does not address what you need. You are trying to analyze Likert-scale questions, correct. Do you have a separate weighting for each ID or are you coming up with a weight variable on the fly?

It shouldn’t matter. The example I have has a separate column for weighting. In the packaged workbook (you can download it from https://www.datarevelations.com/DataRevelations_TC2014.zip) you will find a tab called “Likert Scale Questions using Divergent Stacked Bar (Weighted)”. You can add various dimensions to rows or to filters and you’ll see everything works.

If you want to do a “Percent of people that chose the top 2 boxes” type of visualization the calculation would look like this:

SUM(IF [Score]>=4 then [Weight] else 0 end) / SUM([Weight))

This assume you are using a 1 – 5 Likert scale.

OK… what if what I’m providing here does not work?

I am slammed until the middle of next week. So, if you need an answer now GO TO THE TABLEAU WEBSITE and post your question to the ENTIRE community. There’s an entire group dedicated to visualizing survey data.

If that doesn’t work, we can set up a screen-sharing session for next week.

Steve

Yamil MedinaApril 11, 2017 at 3:49 pmHi Steve, thanks for all you do. The survey data information in your blog is gold for many of us.

I am struggling to come up with a solution to calculate z-test for weighted data. Actually is the proportional z-test for weighed data since what I need is to compare last month % vs same month last year %. (my % are promoters or detractors)

I know how to do this without a problem with unweighted survey data but in this case, we need to do the significant testing with weighted data.

Also, the coloring is not an issue either since I know how to do it. My issue is to come up with the calculation in Tableau. After so much research I found that SPSS can calculate that very easy but I need this in Tableau since I use Qualtrics for the surveys and then Alteryx to Tableau so I am not sure if I should do the calculation outside Tableau. Any ideas, suggestions, references you can provide? I was unable to find a post in the tableau community so I posted in the forums.

swexlerApril 11, 2017 at 5:45 pmYamil,

Hmm… I’m sure I’ve dealt with this a bunch of times, but not quickly finding example workbooks.

I assume you’ve looked at the following post, yes?

https://www.datarevelations.com/showing-now-versus-then.html

My hunch is that for all the formulas, if you replace wherever you see [number of records] or 1 with [Weight] it may just work.

I’ll keep looking but so far this is what I can come up with.

Steve

Paul MastrangeloOctober 17, 2017 at 1:29 pmHi Steve – Thanks for the site. I wanted to note that if the survey questions have different n sizes (e.g., some people skip some of the questions), then each item needs a separate weight. Alternatively, one could use mean replacement for missing values of the original scores, which would allow one weight column again. Also, note that if one does not have access to the “raw” data file, the weighting can be done if you know sample sizes for each segment:

Weighted Mean for each segment = raw mean * segment sample size / segment population size

or

Weighted Percent Favorable for each segment = raw % * segment sample size / segment population size

Where estimated population score = SUM of weighted scores for each segment

But some online survey reporting tools only report the maximum n size for a group (not an n size for each item), which would reduce the accuracy of the population estimate if respondents can skip questions.

swexlerOctober 17, 2017 at 3:33 pmPaul,

Thanks for the comments. Indeed, my calculations are on a weighted value for each respondent becoming part of their demographic profile rather than calculating the weight based on the particular question that is in play. While I’ve not done it (I don’t have the statistics chops) it should be possible to generate a weighting, on the fly, in Tableau.

Dan W.December 4, 2017 at 4:31 pmHi Steve, thanks for a great article. I recently completed a study of Students. There were large differences in the composition of Graduate vs Undergraduate Students in the survey responses vs the Student population as a whole, so I weighted the data on this. Now I’ve been asked to segment and compare Domestic vs International Students in the analysis. Do I need to look into weighting each of these segments or do I just breakout the already-weighted all responses data into Domestic and International? Why or why not? Thank you for any help you are able to provide.

swexlerDecember 5, 2017 at 4:27 pmDan,

Truth be told I’m not sure as this is not within my expertise.

I’ll do some research and suggest you do the same… lots of people out there that should be able to help.

I’ll post if I find anything useful and hope you will do the same.

Steve

swexlerDecember 7, 2017 at 4:33 pmDan,

Here is a response from a friend and colleague who works at Marist Poll:

There are a couple of questions I have that need to be considered in order to provide an answer to the question.

1) Are there population parameters available for which to weight? By that I mean are there known data about the composition of the population that can be used to apply weights? It appears that this is a survey of a college/university, so are there known counts of the proportion of students that are domestic undergraduate students, international undergraduate students, domestic undergraduate students, and international graduate students?

2) If there are known population parameters, what was it about the data collection that didn’t allow for a representative sample?

3) If there are known population parameters and a sense that data collection was representative and things are still not balanced, I would suggest a redo of the weights to a variable that includes the following categories:

Domestic undergraduate students

International undergraduate students

Domestic graduate students

International graduate students

There certainly should not be another weight on top of the weight that has already been done.

If there is not an issue with domestic vs. international being representative, it may be best served to have that as a cross-tab to highlight differences or similarities among those groups.

Dan W.December 12, 2017 at 5:00 pmHi Steve, thank you sooo much for your response (and please thank your associate at Marist Poll as well). I apologize for my delay in response. Here are my responses to the questions:

1) Yes, there are known data about the population that can be used to apply weights. We are a membership association surveying our Student members, so we have information from when they join us. When the survey results came back, I noticed this discrepancy when compared to our total population of Student members, so I weighted the data accordingly on this parameter alone. Other demographics (e.g. % domestic or international) were comparable b/tw our Student population as a whole and the survey respondents. Theoretically, we could compare the groups you suggest (domestic undergrad, international undergrad, etc.)

2) I sent the survey to all Student members to get a good sample size, as we are segmenting the data so were looking for as many respondents as possible to ensure sample sizes high enough to report on. I did not know Graduate Students would be more likely to respond and that this discrepancy would exist.

3) Do I need to incorporate Domestic vs International into the weighting scheme since there were not differences (in terms of total population) b/tw all our Student members and the survey respondents?

So, again, I am presenting the results as 1) overall (weighted) results (all questions by all survey respondents…one group), 2) all questions by graduate versus undergraduate students to look at differences b/tw these groups and 3) all questions by domestic versus international students to look at differences b/tw these groups.

My question is for 2) and 3) do I use the weighted or unweighted results (or do I need to do more weighting w/in 2) and 3) …this seems excessive).

Thanks again!

Dan

Brian RJanuary 14, 2019 at 7:02 pmThis is awesome. However, I cant seem to figure out how to create the “Value” measure in Tableau. Is this something that the survey data must provide?

Brian ReyesJanuary 14, 2019 at 7:04 pmThis is great! However, I have not been able to figure out how to create the “Value” measurement in Tableua. Is this something the survey data must provide?

Thanks,

swexlerJanuary 14, 2019 at 7:09 pmBrian,

This is all part of getting your data setup correctly. Please see https://www.datarevelations.com/surveyjustso.html to see how to generate the field [Value] from your survey data.

Steve

MelNovember 13, 2019 at 5:05 pmHi there,

Is it good practice to use weights when calculating average dollar amounts by demographic? I’m analyzing a survey about the average expenses of Americans around the holidays. Conceptually it feels a bit strange and I guess I’m just looking for a methodology here to explain why you should or shouldn’t do it.

Any help would be appreciated!

swexlerNovember 17, 2019 at 2:21 pmMel,

Indeed, it seems wrong to multiply a salary by some number based on weighting.

This is where I defer to others who have a strong stats and survey methodology background. I can ask around, but suggest you do the same (you can try the Tableau forum, but I suspect you’ll get better answers from a survey creator group).

Steve

MelNovember 18, 2019 at 10:53 amHey Anna and Steve,

Thanks for getting back to me! So the question I’m asking myself with this survey is, how does online shopping around the holidays differ by age group in terms of spending? I’ve been asked to weigh my survey respondents by age to make my data more nationally representative. Our survey respondents usually range on the younger end. But when I apply those age weights to the entries to “amount typically spent online shopping”, I got hung up on whether or not I should be applying weights to find an “average spending of X age group”. In this case, what would be the proper or useful way to use weights to find these differences by age group?

Thanks for your help!

swexlerNovember 18, 2019 at 11:10 amSo… I’m looking forward to seeing what comes of this, but I suspect that this will be a good start for OVERALL spending.

SUM([Amount Spent by respondent] * [Weight for respondent)) / SUM([Weight for respondent])

But if you are breaking things down by age group and that is the source of how you are weighting, then you would not take weightings into account when computing the averages.

But this is not my strong suit and I should do some research on this.

Steve

Anna FoardNovember 17, 2019 at 4:12 pmHi Mel,

For your analysis, is there a reason you’d want different demographics represented in the first place? If not, no need to weigh them. But if there’s a reason (as in, they contribute differently), weights make sense if the number in each group differs. If you give a little more information I may be able to help you.

swexlerNovember 17, 2019 at 6:32 pmMel and Anna,

I asked Anna to chime in as she’s one of my “go to” people when it comes to this type of thing.

So… let’s say you weigh various respondents and ID1001 is weighted 1.4 and ID1002 is weighted .87.

ID1001 earns $100K a year, but would you represent this his / her salary as $140K. I think not.

But… when computing a weighted average you would want your system to think that 1.4 people make $100K, rather than one person making $100K.

But… I sure hate averages when dealing with something like salary, so I would probably have the actual salaries in a jitterplot but have a toggle to show a weighted box-and-whisker plot vs. unweighted.

Steve