May 232014
 

Overview

I’ve had a spate of requests from clients to show how survey responses rank across different categories and I’ve come up with a way that makes it very easy to see where the big stories are.

Note that this approach works for any measure that can be ranked, not just survey responses.

Let’s see what I mean…

Consider the bar chart below that shows the results to a survey question “indicate which of the following that you measure; check all that apply”.

Figure 1 -- Percentage of respondents that measure selected items, ranked from highest to lowest.

Figure 1 — Percentage of respondents that measure selected items, ranked from highest to lowest.

Traditional approach to showing rank within a category

Now, suppose you wanted to see the percentages and rankings broken down by different demographic components (e.g., location, gender, age, etc.).  There are myriad Tableau knowledge base articles and blog posts on how to do this and they lead to results that look like the one shown in Figure 2.

Note: Pretty much all of those articles and blog posts are now obsolete as they make clever use of the INDEX() function.  With Tableau 8.1 you can use the RANK(), or one of its variations, and not have to go through as many hoops.

Figure 2 -- Traditional approach to showing ranking within a category.

Figure 2 — Traditional approach to showing ranking within a category.

I find this a tough read.  Even if I add a highlight action it’s still hard for me to see where a particular item ranks across the four categories.

Figure 3 -- Ranking within a category with highlighting.

Figure 3 — Ranking within a category with highlighting.

Don’t try to show everything at once

My solution is place the Generation on the Columns shelf and to not show everything at once, but to instead allow the user to explore each of the possible responses and see how these responses rank across the different categories.

Consider the dashboard shown below where the top worksheet shows the responses across all categories.

Figure 4 -- Dashboard with no item selected.

Figure 4 — Dashboard with no item selected.

Now see what happens when we select one of the items in the list.

Figure 5 -- Dashboard with an item selected shows that items rank and percentage across different generations.

Figure 5 — Dashboard with an item selected shows that items rank and percentage across different generations.

Okay, not much to report here – Adrenaline Production is ranked first in three categories and second among Traditionalists, although Traditionalists’ measure it quite a bit lower than the other three groups.  Still, we’re not seeing any wide swings.

But look what happens when we select Breathing…

Figure 6 -- Breathing: our first big story.

Figure 6 — Breathing: our first big story.

Now that’s a big story!  And it pops out so clearly.

Reporting vs. interacting

This is all fine and good if you publish this as an interactive dashboard and you expect people to, well, interact; but what happens if you want to publish this as a static graphic in a magazine?

The solution is to find where the big stories are and show those in the magazine; that is, do the work for your reader and show him / her where the big differences are.  In fact, that is exactly what I’ve done in Figure 7.

How the dashboard works

Here’s how the top part of the dashboard is set up.

Figure 7 -- Configuration of top worksheet.

Figure 7 — Configuration of top worksheet.

Rank is defined as

   RANK_UNIQUE([CheckAll_Percent])

Note that we’re addressing the table calculation using Wording.

Notice also that Wording is on the Rows shelf.

The bottom part of the dashboard is set up like this.

Figure 8 -- Configuration of the bottom worksheet.

Figure 8 — Configuration of the bottom worksheet.

Goodness, we can’t tell what any of the bars mean because Generation is on the Columns shelf and Wording is on the Level of Detail and not Rows.  If you put it on Rows you get something that looks like this.

Figure 9 -- Placing Wording on the rows shelf tells a different and harder-to-understand story.

Figure 9 — Placing Wording on the rows shelf tells a different and harder-to-understand story.

The key takeaway is that we cannot make a single visualization that tells the story.  You need both the first and second visualizations working together.

A Filter and a Highlight Action

We use both a Highlight and a Filter action to make the two visualizations work well.  The Filter action is there to make the second worksheet disappear once you clear the selection in the first worksheet; The Highlight action highlights where the item appears in the second worksheet.

Here are the two actions:

Figure 10 -- Two actions tied to the same mouse click.

Figure 10 — Two actions tied to the same mouse click.

The Filter action is defined as follows.

Figure 11 – Definition of the Filter action.

Figure 11 – Definition of the Filter action.

This tells Tableau that when a user selects something from the first worksheet (Percent that Measure-Overall) it should filter the second worksheet (Percent that measure-by Generation)  by the field Temp.  Temp is just a string constant that I’ve placed on the color shelf; it’s only use is that we have to filter by something in order for the Exclude all values setting to work (and that is critical for the behavior of the dashboard.)

Here’s how the Highlight action is defined.

Figure 12 -- Definition of the Highlight action.

Figure 12 — Definition of the Highlight action.

This tells Tableau that when a user selects something from the worksheet on top, Tableau should highlight items in the second worksheet using Wording as the selected field (where Wording is the dimension we placed on the level of detail rather than on the Rows shelf.)

Conclusion

I’ve found this approach to showing of rank across categories very useful and it’s been a very big hit with my clients.  By placing the categories across columns and using highlight actions we make it very easy to see where the big differences are among different respondent groups.

Feb 062012
 

Overview

I really am not “out to get” Texas.  Yes, I have previously published a visual study tracking STDs, HIV, and AIDS in Texas, which in turn led to some blog posts on the subject, but one of the main reasons for focusing on Texas was that the data I needed were readily available and that was not the case with many other states.

So, in my most recent visual study on Foster Care placement I looked at data from all 50 states.  I really wasn’t even going to look at Texas, but something unexpected just bubbled to the top (or to be accurate, plummeted to the bottom) when I looked at one very important finding.

Let me explain.

The main focus of the study was to compare states that placed children in kinship foster care (i.e., placement with a member of the child’s extended family) with states that placed children in congregate care (i.e., placement in a group home or institution).  The National Coalition for Child Protection Reform maintains that kinship care is the least harmful form of foster care while congregate care is the most harmful.

For the record, Texas ranks 13th for kinship care and 24th for congregate care (where being ranked high in kinship care is good but congregate care is bad.)

So, what’s the problem with Texas?

Perhaps more important than the issue of kinship vs. congregate care is whether or not states are needlessly placing children in any form of foster care.

For the last two years for which data is available, the number children placed in foster care decreased in 35 of 50 states.  And if we exclude Texas from our analysis, the overall number of cases for the other 49 states decreased by 3.9%.

And Texas?  The overall number of cases increased by 9.6% giving Texas the bottom spot on the list.

So, whether it’s STD cases or foster care placement, the old adage seems to apply…

… everything is bigger in Texas.

Note: Click here to interact with the data.

 Posted by on February 6, 2012 4) Health and Social Issues, Blog 2 Responses »
Sep 212011
 

What’s Going to Happen when the Clinics Close?

I’ve had a lot of conversations this past week regarding my data visualization tracking STDs, HIV, and AIDS in Texas from 2006 to 2010 (see http://bit.ly/r57qeR.)

One of the bigger questions that may have been overlooked in my accompanying blog post is what is going to happen over the next two years as state budget cuts take effect? Consider the chart below that shows the increase in reported STD cases over the past five years.

Granted, combining Chlamydia, Gonorrhea, Syphilis, and HIV into one lump and not taking overall population growth into account may be misleading, so consider the chart below that shows the number of STDs and HIV cases, incident rates, and percentage changes from 2006 to 2010.

Even without drilling very deep it looks like Texas is having a lot of trouble just containing the spread of STDs and HIV.  To be fair, one of the reasons the number of Chlamydia cases has increased so much is that the Texas department of State Health Services has adopted new technology and screening techniques which in turn has led to much better detection and reporting.

That said, we pretty much see all the trend lines moving upward.

So why should we be very worried?

The numbers shown above reflect a state healthcare system that had previously budgeted over $50 million per year for family planning initiatives, including education, screening, and treatment for STDs and HIV / AIDS.  These funds have been severely cut (see http://www.texastribune.org/texas-legislature/82nd-legislative-session/day-15/.)

So, what’s going to happen when clinics close and those clinics that do remain open have to cut staff and shorten office hours?

(To explore the interactive visualizations, click here.)

 Posted by on September 21, 2011 4) Health and Social Issues, Blog 1 Response »
Sep 162011
 

Overview

The Texas State Legislature’s recent and sweeping funding cuts to all family planning organizations – including the complete defunding of Planned Parenthood – has lead Data Revelations to examine historical data on sexually-transmitted diseases (STDs) and HIV / AIDS and suggest what the case count and incidence rates will look like in the near future.

Note: The Center for Disease Control (CDC) divides the disease we studied into two groupings: STDs (which include chlamydia, gonorrhea, and syphilis) and HIV / AIDS.  While much of the following analysis that follows looks at overall case count and incidence rates, the interactive dashboards allow exploration by both disease category and individual diseases.

Data source: Texas Department of Health Services

The interactive dashboards may be found at the end of this blog post.

Special thanks to Joe Mako for building the county polygons and providing invaluable advice.

Key Findings

  • Roughly ten percent of Texas’ 254 counties account for 80% of all cases.
  • Within these counties, the incidence rate for STDs is up 28% from 2006.
  • Incidence rate for HIV is up 5%, but for AIDS it is down 31%.
  • The two counties that can boast the largest rate decrease for all diseases tracked in the study are Hays (-13.9%) and Travis (-7.4%).
  • The two counties with the greatest incidence rate increase for all diseases tracked in the study are Jefferson (+122%) and El Paso (+65.5%).
  • There appears to be a strong correlation between the existence of Planned Parenthood locations and decreased incidence rates (though not in all locations).
  • We believe that the recent cuts in family planning funding will lead to a large increase in cases in 2012.

Understanding the Landscape

The image below shows the incidence rate (number of cases per 100,000 persons) for all diseases broken down by county.

Note that if you hover over a county you can see information about that county.

If you click a county in the top view the table at the bottom of the screen will show results for just that county:

Where to start?

Let’s change the view and focus first on the counties that have the largest number of cases.  We can do this easily by coloring the map by Cases instead of Rate, as shown below.

Now we can see that a small number of counties are responsible for a large number of cases.  Which counties should we focus on first?

Vilfredo Pareto and the 80-20 Rule

The Pareto Principle, or 80-20 rule, is named after Italian Economist Vilfredo Pareto who early in the 20th century observed that 80% of the land in Italy was owned by 20% of the population.

In Texas it’s the 80-10 Rule

The visualization below shows that when it comes to the number of cases, just over 10% of the counties are responsible for 80% of the cases.

Note that if we highlight just the first ten percent of the bars in the top visualization we will limit the number of counties displayed in the bar graph to the 28 that account for 80% of the cases.

Specifically, selecting ten percent of the bars (1), reduces the number of counties from 254 to 28 (2) and reduces the overall case count from 697,456 to 588,416 (3).

So, now that we know what counties to focus on, what can we learn about them?

Cases, Rate, and Percent Rate Change

The next dashboard offers several ways to see how counties have performed from 2006 through 2010 as we can look at Cases, Rates, Percent Change, individual diseases, and so on.

We found “% Rate of Change” the most enlightening view so we’ll focus on that.

Let’s see what happens if we just focus on STDs and exclude HIV / AIDS from the mix.

So, what’s up with Chlamydia and Syphilis?

If you look at the individual counties using the visualization in the top portion of the dashboard you will see that without exception the percent change in Chlamydia rates from 2006 is up in all 28 counties while Syphilis is down in many counties (and way up in others.)

When asked about these numbers, an epidemiologist at the Texas Department of Health Services stated that the increase in Chlamydia rates should be attributed to an increase in better testing technologies, expansion of electronic lab reporting, and increased screening.  Chlamydia is often asymptomatic and that five years ago many cases were undetected or unreported.  As both detection and attention to reporting have improved one should see a larger number of cases.  That said, some counties are much worse in this respect than others.

As for Syphilis and Jefferson County’s “off the chart” numbers for 2009 … we’ll look at this in a moment.

Looking at the Trend for All STDs

Another way to view the data is to combine a disease group into one line by selecting Show overall from the drop down list box.

In the screen below we track overall % rate change for STDs from 2006 through 2010. It’s very easy to see where the outliers are.

Now that we have some tools to determine where counties are succeeding and failing, let’s see if we can determine why this is happening.

Location, Location, and…

The dashboard below compares the percent rate change among the 28 counties.  The size of the circles indicate the number of cases and the color indicates whether the incidence rate has increased (orange) or decreased (blue) from 2006.

We’re particularly interested in the dark orange and dark blue dots (the outliers), so let’s see what happens when we click the dark orange dot that borders Louisiana.

So, what happened in Jefferson between 2006 and 2007 to cause the initial spike, and eventual peak in 2009?  We believe it has to do with clinic locations as well as a once-in-a-generation environmental event which we will explore in a moment.  (We still don’t know why, but at least we know when the problem started.)

If we clear the selection and look at the larger, albeit not as dark, dot all the way to the west we see the following:

What happened in El Paso between 2009 and 2010 that caused such a large increase?  Again, we’ll explore this in a moment but we believe clinic location and administration has a lot to do with this.

Enough of the orange dots; let’s look at the other end of the spectrum and explore Travis County where we see an impressive decrease for all STDs.

So, we have a better sense of when problems (or improvements) occur, but we still don’t know why they occur

One additional set of data that might help us figure out why some counties are succeeding while others are failing is to look at the location of family planning clinics; i.e., clinics that provide screening, counseling, education, and treatment for STDs and HIV/AIDS.

All clinic locations

The diamond shape indicates locations of family planning centers as of August 2011.  The size of the dot indicates the number of centers within a county.  You can hover over a dot to see more information about a county and the number of clinics.

At this point there doesn’t appear to be an obvious correlation between location and number of current clinics and the percentage rate change.  Let’s see what happens if we only look at clinics run by Planned Parenthood.

Planned Parenthood locations

Three questions come to mind upon seeing this visualization:

  • Why are there no Planned Parenthood locations in El Paso (1)?  Is this related to the spike in cases from 2009 to 2010 that we saw earlier?
  • Why are there no Planned Parenthood locations in Jefferson (2)? And as we saw earlier, why the large spike in cases, particularly Syphilis, between 2006 and 2007?
  • Why is Potter County (3) succeeding where almost all the other counties are struggling?
  • Why is it that some clinics appear to be succeeding while others are failing?  Are there other issues besides location?

El Paso

Dr. Fran Hagerty, CEO of the Woman’s Health & Family Planning Association of Texas, states that there was in fact a Planned Parenthood office in El Paso but it was forced to close at the end of 2008 and the new entity that took over for it went through a fair amount of turmoil.  The epidemiology office at Texas Department of Health Services agrees that things were indeed in flux at that time.  We see that as of 2010 there are still problems.

Jefferson

While I have not yet found out why there is no Planned Parenthood office in Jefferson (or if there ever was one) there’s one thing that might explain the spike in cases (particularly syphilis):

Katrina

At the end of 2005 and through much of 2006 there was a wave of what can best be described as refugees from Hurricane Katrina that settled in Jefferson County. Many were poor and without jobs and adequate housing.  Incidence rates peaked in 2009 but the significant decrease in case count in 2010 suggests that the Texas Department of Health Services has now controlled the epidemic.

Potter

There’s no Planned Parenthood location here, but Potter County can boast only a very small increase in STDs from 2006:

What is this county doing differently from the others?

A call to Dr. Ron Barwick, CEO of the Haven Health Clinics in Amarillo indicates that the clinic he runs had been affiliated with Planned Parenthood but broke off from them in 2006 as the then-named Texas Panhandle Family Planning & Health Center was no longer going to offer abortion services.

We asked Barwick why he believed his clinic was doing so much better than most others and he stated that the main reason was that, in addition to emphasis on education, Haven had opened a male health center and this was having a significant impact on reducing STDs.  Specifically, men that would be uncomfortable sitting in a waiting room where there were women were not uncomfortable going to the male-only clinic.

Clearly, Dr. Barwick and his colleagues are doing something right.  Unfortunately, the recent budget cuts will make it difficult for them to continue, let alone share their best practices with others.

Is it All About Location?

We started to perform an analysis where we looked at the number of cases vs. the number of clinic locations and realized that we would be missing a critical data point: How well-funded and well-staffed is the clinic?

We would want to explore the relationship between clinic headcount and funding before being able to state whether a particular location is succeeding or failing.  For example, Harris counry has the greatest number of cases and certainly has many clinic locations.  Are these clinics staffed by one person or dozens?

The same question would apply to Hidalgo where there appears to be a lot of clinics given the number of cases.  How many people staff these clinics?  What is the funding for each one?

Clearly, just mapping clinic location to the number of cases is not enough.

What Now?

In addition to exploring the relationship between cases, funding, and staffing, one obvious next step would be to look at just what it is that high-performing clinics do differently from the lower  performers and have the lower performers adopt the practices of the high performers.

Unfortunately, the unprecedented state budget cuts will prevent this from happening as many clinics – including those that are getting good results – will either have to close or severely curtail their operations.

To get an idea of the impact of the budget cuts we can look back to 2006 when the Texas legislature enacted far less sweeping cuts.  Indeed, had we used 2005 as our baseline year rather than 2006 when the cuts were enacted we would see that the rate of STDs is up 39% from the baseline year vs. 28%.  But those budget cuts pale in comparison to Texas’ complete defunding of Planned Parenthood and severe defunding of other family planning entities.

We believe that a view of these dashboards in 2012 will show more, larger orange dots, indicating a much larger number of cases which will lead to an over-taxed health system, lost productivity, and increased human suffering.

 Posted by on September 16, 2011 4) Health and Social Issues, Blog 5 Responses »
Jul 252011
 

Overview

Last year, UN Global Pulse launched a large-scale mobile phone-based survey that asked people from India, Iran, Mexico, Uganda, and Ukraine how they were dealing with the effects of the global economic crisis.

The survey (conducted from May-August 2010) asked two multiple choice and three open-ended questions focusing on economic perceptions.

Note: The fully working dashboards may be found at the end of this blog post.

Key Findings

Responses from Uganda – a country that ranks in the bottom 15th percentile in the UN’s Human Development index – were consistently more optimistic than responses from other countries.

What could account for this? Is it that Ugandans are, as a group, more hopeful and optimistic than people in the other countries surveyed?

Or could it be that survey responses were somehow skewed?

Let’s explore the data to find out.

Voices of Vulnerable Populations during Times of Crisis

Clicking the second tab displays the following view.

Economic Change Index

So, why in the first graphic does Uganda warrant a positive blue bar and Mexico a negative orange bar?  By moving your mouse pointer over a bar you can see just what it is that drives the Economic Change Index.

Here are the results for Uganda…

… and here are the results for Mexico:

The index itself (1.2 for Uganda and -1.6 for Mexico) is computed by applying Likert-scale values to each of the possible question responses.  We’ll discuss the advantages of using this approach in a moment.

Fixed Responses vs. Using One’s Own Words

The first two questions in the survey gave respondents four choices from which to choose.  The remaining three questions allowed people to respond in their own words.

You can explore these responses yourself by picking a question and a country from the drop down list boxes.

So, does the sentiment shown in the first fixed-response apply the open text responses as well?

Promising vs. Uncertain

Here is how people from Uganda responded to the question “In one word, how do you feel about your future?”…

… and here is a visualization of the responses from Mexico.

This, combined with responses to other questions, left me scratching my head. What are we not seeing that would lead to responses from Uganda — a country that is arguably in worse condition than the others — being so upbeat?

If you can’t wait for the answer, click here.

A Word about Word Clouds

I’ve analyzed a lot of survey data and I hate analyzing survey results where people get to provide free-form text responses because aggregating responses based on a common sentiment can be very difficult.

In many cases Word Cloud generators can convey the overall sentiment from multiple text responses.  They are also interesting to look at and I do believe the ones shown above are a good reflection of respondent sentiment.

A problem occurs, though, when respondents use different terms that describe the same or similar sentiment.  Consider the Word Cloud shown below.

One might think that most respondents were happy, but look what happens if we “linguistically normalize” the terms that are synonyms of “sad”:

It turns out that more people are in fact sad.

Note: There are products that are capable of parsing full sentences and are able to “disambiguate” and then normalize terms under umbrella concepts. The text responses to this particular survey, however, do not warrant this type of heavy artillery.

How We Calculate the Indices

The next tab in the workbook shows some alternative ways of visualizing the fixed-response survey results.

For these questions respondents were given four choices:

Easier / Better

Same

Worse / More Difficult

Much Worse/ Very Difficult

Notice that we display the calculated index atop the Likert-scale stacked bar charts.  There are three advantages to calculating an index for Likert-scale responses:

  1. It makes it easy to weigh sentiment across many responses.
  2. It makes is possible to track sentiment changes over time.
  3. It makes it possible to compare results against various objective economic indices (e.g., GDP, UN HDI, etc.).

Note: I have no problem using even-numbered Likert scales, but I do think in this case sentiments will be skewed towards the low end as there are two levels of pessimism (e.g., “worse” and “much worse”) and only one of optimism (e.g., “better”).

I attempted to combat this by applying the following values to the responses:

Easier / Better = 3

Same = 0

Worse / More Difficult = -2

Much Worse/ Very Difficult = -4

While I think these values make sense, users of this dashboard are welcome to use the sliders and apply different values to each of the answers.  The indices will be recalculated automatically.

A Composite Index

In an earlier version of this dashboard I created a “composite index” that combined results from the two fixed-response questions:

I think this is a valuable metric and one that I would include should UN Global Pulse make this study longitudinal (see below).

Mobile Pulse Survey Results vs. Objective Economic and Human Development Indicators

In the next tab we see survey responses (first column) vs. the United Nations Human Development Index Ranking (second column).

What could account for Ugandan survey respondents being the most optimistic despite the fact that they rank 143 out of 169 countries in the UN’s HDI Ranking?

I believe that the survey’s SMS Text-based approach is skewing the results.

Consider the third column where we see the number of mobile subscribers within a country as a percentage of that country’s population.  In Uganda, at most 29% of the population has a mobile phone suggesting that those completing the survey may be better-off financially than others within their country. Survey responses may not, therefore, be a reflection of the country as a whole. (See The CIA World Factbook for mobile phone subscription information.)

This would not be the first time premature reliance on phone polls has derailed a survey (or in this case, just part of a survey).  See A Couple of Interesting Examples of Bias and Statistical Sampling.

Make the Survey Longitudinal

Despite the shortcomings, I think there is a lot of value in conducting these types of agile, real-time surveys.

One ongoing challenge will be comparing subjective data among different countries as there are so many cultural / proclivity issues that are difficult to compare.

One way to do this would be to conduct a longitudinal study and see how sentiment changes over time.  That is, instead of comparing Uganda with Mexico or India with Ukraine for a given year, track the changes over time, using an index.  This would allow you to see the percent change in sentiment between time periods without having to worry about normalizing cultural differences.

I hope that UN Global Pulse will update this survey on a regular basis as there’s much we would be able to learn from such a study.

 Posted by on July 25, 2011 4) Health and Social Issues, Blog 2 Responses »