swexler

Jan 162014
 

Overview

One of the new features in Tableau 8.1 that Tableau Software is trumpeting quite a bit is one-click Box and Whisker Plot generation.  While I appreciate the new functionality, this chart type doesn’t “sing” to me the as much as jittering does.  Indeed, this “jittering” capability was the BIG discovery for me in 2013.

Let’s see how a box and whisker plot compares with jittering using a simple example.

Note: Interactive dashboards that illustrate jittering techniques may be found at the end of this blog post.  Feel free to download and explore.

Salary and Age Bins – Default

Consider the following pre-Tableau 8.1 salary chart that shows how salaries are distributed across age bins.

1_Salarydistribution_Age

Figure 1 — Default Salary Distribution by Age Bins

 

While we can see that the top salaries are enjoyed by people in their 50s, there’s nothing that gives us concrete percentiles nor shows us where the outliers are.  We also can’t tell that there are in fact thousands of dots in the visualization as so many marks are sitting on top of each other.

Salary and Age Bins – Box and Whisker Plot

To see percentiles and outliers we can use Tableau’s Show Me feature and click the Box-and-Whisker Plot button.

2_SalaryDistrib_BoxWhisker

Figure 2 — Salary Distribution by Age Bins with Box and Whisker Overlay

 

This is definitely an improvement, but I really don’t “feel” the data as I can’t see how the dots are distributed; they are all stacked on top of each other.

Salary and Age Bins – Jitters

Here’s the original chart, but with the marks “jittered” using a modified version of Tableau’s built-in INDEX() function.

3_SalDisJitters

Figure 3 — Salary Distribution by Age Bins with the marks “jittered”

This gives me a much better feel for the data as I can how the thousands of marks cluster.  Of course, I can still superimpose the box plot, as shown here.

4_SalDisJittersBox

Figure 4 — Salary Distribution by Age Bins with the marks “jittered” and box plot overlay

Getting Jitters Using INDEX()

To “jitter” the marks I create a calculated field called “Index” that uses Tableau’s INDEX() function.  I put this on the Columns shelf and compute using ID, as shown here.

5_Index

Figure 5 – First attempt using Tableau’s INDEX() function

It turns out that for this particular example INDEX() by itself works because there is an equal distribution of IDs across each of the age bins.  Consider the example below where we show a distribution of Superstore Sales across different customer segments.

6_superstore

Figure 6 – Shortcomings of using INDEX() by itself.

Notice that the strip of dots within “Corporate” is much wider than the other segments because there were more orders within “Corporate” than there are in the other segments.

The easiest way to fix this is to edit the axis and select “Independent ranges for each row or column” from the Edit Axis dialog box.  While this will work fine we’ll look at a different technique that will allow us to control the degree of jittering.

Using Modulus to Control Jittering

When I first blogged about this technique last year, Alex Kerin of Data Driven suggested a simple and elegant solution to different-sized partitions using Tableau’s Mod function.   For those of you that forgot your high school mathematics, we use a modulus is to determine the remainder when you divide one number by another.  Here’s an example

14 ≡ 30 Mod 8

Translation: 14 is equivalent to 30 Mod 8 because you get the same remainder when you divide 14 by 8 as when you divide 30 by 8 (both remainders are equal to 6).

So, how do we use this capability in our visualization?  We want the same number of dots in each segment, so instead of using INDEX() we will instead use INDEX()%25

This will create 25 “rows” of dots within each segment.

Specifically, when

INDEX()=1, INDEX()%25 will be mapped to 1
INDEX()=2, INDEX()%25 will be mapped to 2


INDEX()=26, INDEX()%25 will be mapped to 1
INDEX()=27, INDEX()%25 will be mapped to 2
etc.

Note that 25 is not a magic number.  For this example anything above 15 will do the trick (and in the demo workbook I have a parameter slider that controls the MOD setting).

Conclusion

Jittering is a very simple technique and it helps overcome the problem of marks being stacked atop each other when plotting a distribution within a dimension.  It only takes up a little more screen real estate and it packs a terrific visual wallop.

 

Oct 312013
 

If I see a visualization that is poorly designed or worse, misleading, I’m going to say something about it. I hope you will do the same.

In March of 2013 Stephen Few published a scathing review of Tableau 8. Few’s thesis was that Tableau had caved to marketing pressure and its new product would encourage users to craft “analytically impoverished” visualizations.

At the time I thought that Few’s screed was unfair (see my blog post), but a recent post from Emily Kund about a company’s internal “Iron Viz” competition made me wonder if perhaps Few was right.

Before I get into what deeply troubles me about the aftermath from the contest I do want to applaud Kund and her colleagues for fostering interest in Tableau and data visualization best practices.  Clearly, I have a fondness for these types of contests and like the excitement they generate about visualization.  I also believe strongly in making interactive visualizations that are fun and inviting.

My problem is that while everybody is rightfully patting Kund on the back for having the contest, nobody in the Tableau data visualization community (and it is an amazing community) has pointed out what is wrong with the dashboard — and there is a lot that is wrong with the dashboard.

Too Much Sugar

Let’s have a look at the winning entry from the Halloween data visualization competition.

DTSS Winning Viz Image

Winning entry

This winning viz epitomizes the type of creation Stephen Few feared that people would construct in his now infamous review as this dashboard sacrifices clarity and accuracy for whimsy. Why have the stacked bubble chart, and why have the pumpkins representing annual spending? Humans are absolutely horrible at comparing areas of circles — why use them here? I also don’t buy the size of the pumpkins at all as the $4.7B pumpkin for 2009 is considerably smaller than the $5.0 billion for 2006.  It looks to me like the author exaggerated the size of the pumpkins.

More importantly, by fighting Tableau’s own default settings the author has hidden the biggest story the data is trying to tell us.

Why Didn’t You Let Tableau Make a Line Chart?

Let’s focus on the pumpkin chart along the left side of the dashboard:

DTSS Winning Viz Image_leftside

Unreliably-sized pumpkin chart

Here we see annual sales by year.  Using the same data, in Tableau if we simply select the two fields and click the Show Me button Tableau will automatically generate the following visualization.

Halloweed_Tableau1

The default chart Tableau creates

Now, tell me you didn’t just think “whoa… what happened in 2009?”

THAT’S the big story.

Have Your Candy and Eat It, Too…

I “get” that the nobody is going to get very excited about the viz Tableau creates by default.  Without something to capture the viewer’s interest he/she may not bother with the viz (see Ben Jones’ excellent posts on this subject.)

So, if we must add some “viz candy” why not start with the line chart and dress it up, like the one below?

Line chart with pumpkins

A “fun” chart. 10 seconds to build the default line chart and five minutes to apply some graphic design.

Are Stacked Bubbles Inherently Bad?

I don’t think the stacked bubbles work in the dashboard.  I have to work too hard to see that “Candy” at $22.37 is slightly larger than “Decorations” at $20.99.  With a bar chart I could see the differences immediately.

That said, there are some good examples where bubbles elicit an emotional response and just fit with the design flow (see this example from Kelly Martin).

I also like having this chart type in my quiver, even if I never use it on a published dashboard.  I welcome anything chart type that will help me better understand the data, even if I never use that chart type in production.

Getting People to Use The Tools Correctly

I still don’t agree with Few — I don’t think Tableau should remove features for fear that people will use them incorrectly.

But I am very concerned that visualizations that are poorly rendered are being presented as examples to emulate.  As a community we need to do our best to prevent this from happening, so if you see something that is poorly designed — or worse, misleading — point out the problem and show the person a better way to get the desired result.

I have tried to do that here.

 

 

 

 

Sep 302013
 

I recently attended the Tableau Customer Conference.  It was a great conference and if you are into Tableau you should definitely go to next year’s event (see http://tcc14.tableauconference.com/seattle/).

In any case, during the myriad networking opportunities I was very pleased by the number of people who told me how much they had gotten out of a blog post I had written over two years ago.  I decided I should revisit that post and see what, if anything, I would write differently.

So, before you read this post, make sure you read that post.

Did you read it?  The comments, too?

I think it holds up quite well but there are some things I would change.

Let’s go through the major points one by one.

Size Matters

I think I’m ready to retract this recommendation for two reasons:

  • The world has gotten a lot smarter about embedding Tableau Public visualizations.  Either folks make sure to get the size right or they have a link to where people can view the full-sized visualization.  Indeed, it’s been awhile since I’ve seen anything that was really mangled.
  • I would hate to handcuff people from doing work that warrants a larger canvas.  For example, have you looked any of the things Kelly Martin is doing over at her VizCandy blog?  Go ahead, click here (but do come back when you’re finished.)

Some great stuff going on there — in fact my reaction when I first saw what she’s producing was “damn, I’m really going to need to ‘up my game'” — but I’ll blog about this at a later date.  In any case, while the constraints of a 650 pixel-wide canvas can in fact be very useful as it forces you to pare down the fluff, by all means use a wider canvas if your viz warrants it.

Never Use Red and Green as Contrasting Colors

I got some grief about this one, but my rationale is still spot on (more about this in a moment).  That said, I will modify the rule so that it reads as follows:

Never use red and green as contrasting colors without an affordance

And just what do I mean by an affordance?

Consider your typical traffic light.  How do people with red-green color blindness deal with traffic lights?  The answer is that they look at the positioning of the light: red is on top, yellow in the middle, and and green is on the bottom. If the light is configured sideways, red is left, yellow is in the middle, and green is to the right.  If there were only a single light that changed color there would be many more accidents — and not just because of confusion among the color-blind; the non color-blind fine the positioning very helpful as well.

So, if you like red and green or feel you must use it, the following two visualizations are acceptable as they both contain an affordance.  Specifically, The first contains arrows that point up or down and the second has bars that ascend or descend.

Affordance1

 

Affordance2

The following view is not acceptable as there is nothing besides the red and green to telegraph high values vs. low values.

NoAffordance

As for having to worry about this at all, I attended Maureen Stone’s session on Best Practices for Using Color: How and Why and came away with two key findings.

1) This woman is brilliant; and,

2) Yes, you need to worry about this. So do yourself and your interactors / viewers a favor and do as I suggest.

Usability

There’s nothing I would remove from this section, but there is so much more that I should add.

I’m not going to do that right now (sorry).

But…

… as I re-read my post there’s one idea I really want to underscore and that is to ask at least one other person to “fly” your dashboard before you publish it.  That person will both break the thing immediately and find the major stumbling points.  Really, it’s amazing how quickly a set of fresh eyes can find all the flaws.

Hover Help

I still completely endorse this practice but will provide one caveat about creating the ersatz calculated field called “Help”: adding this custom field can, in some cases, really slow down performance if you have a large database and are executing a big query against that database.

The very simple workaround is to create a very sparse additional data source (you can do it in Excel) and create the custom calculation in that additional data source.  Indeed, I’ve gotten in the habit of placing my help and navigation control scaffolding in a secondary (and very small) data source.

Navigation – Dealing with Multiple Tabs

My major takeaway in reviewing this is that I can’t wait for Tableau 8.2 to be available as it will have a new feature called Story Points that will be WAY better than the click forward / click back navigation buttons I recommend when publishing multi-tabbed workbooks.  I only saw a very brief demo of Story Points but what I saw was absolutely beautiful.

In the meantime I still recommend you hand-chisel these forward and back buttons and, should you run into performance problems, put any calculated fields you use in a secondary data source as described in the “Hover Help” section above.

Engagement

I agree with everything I wrote but think my examples are dated as the state-of-the-art has improved enormously in the past two years.  In particular, Tableau’s ability to support floating elements has made it easy for the design-savvy among us to do some great things.

By the way, now would be a good time to revisit Kelly Martin’s VizCandy blog.  And when you’re done check out this post at Anya A’Hearn’s DataBlick web site.

They draw you in, don’t they?

As fun and inviting as all this stuff is I have found the best way to get people engaged is to somehow make the visualization about the person who is viewing it.  One of my favorite examples is the one below, a much-scaled down version of an interactive salary dashboard I built awhile back.  The reason this works is that people want to know (and know immediately) how they compare with their peers.  So, if you really want people to use your stuff your “stuff” should be able to answer questions like these:

  • How does my department compare with other departments?
  • How does our company compare with others?
  • How does my performance compare with my peers?

If you do this people will interact with your dashboards, I guarantee it.

Aug 202013
 

In this installment we’ll look at Utah State University’s publication of student engagement results.  Utah State is one of many collegiate institutions that have participated in NSSE’s national survey of student engagement (see http://nsse.iub.edu/ and http://nsse.iub.edu/html/about.cfm).

Special thanks to Allan Walker for making the underlying data available to me.

Note: I’ve published four sets of questions from the survey as interactive dashboards that you can find at the end of this blog post.

The Good

Utah State University should be lauded for making its survey results available in an interactive format.  This is a great way to foster engagement from students, faculty, administration, and other interested parties.

The Bad and The Ugly

It’s almost impossible to glean anything useful from the published results.

The “Before” Picture

Here’s a screenshot of the analysis of the first set of questions in the survey (see http://usu.edu/aaa/nsse_paged.cfm?pg=1)

Five of the ten questions in the group -- this requires lots of scrolling and makes it impossible to compare results across questions

Five of the ten questions in the group — this requires lots of scrolling and makes it impossible to compare results across questions

Note that there are a total of ten Likert scale questions in this set and they are presented in the same order that they appeared in the survey.

Here are the things I would like to know, but cannot at all glean from the visualizations:

  • Which activities where done most often and which were done least often?
  • Are there any significant differences when you compare results by gender?
  • Are there any significant differences when you compare results by ethnicity?

The “After” Picture

I’ve written extensively on the best ways to visualize Likert Scale data (see http://www.datarevelations.com/likert-scales-the-final-word.html and http://www.datarevelations.com/mostly-monthly-makeover-masies-mobile-pulse-survey.html).

Here’s what happens if we apply this approach to the Utah State University NNSE data.

Divergent stacked bars showing all responses

Divergent stacked bars showing all responses

And if we apply a parameter setting to only show extremes (e.g., “very often/often” vs. “sometimes/never”) the results are even easier to sort and grok.

Divergent stacked bars combining responses

Divergent stacked bars combining responses

This approach also allows us to break the data down by gender and see if there are any questions where there are major differences (and there are major differences).

Comparing results by gender

Comparing results by gender

We can likewise distinguish major differences from Caucasian / non-Caucasian respondents when we look at the results from Question 14.

Comparing results by ethnicity

Comparing results by ethnicity

Seven-Point Likert Scale Examples

Here’s another set of results for questions where the students could provide seven possible responses.

Impossible-to-compare seven-point LIkert scale questions

Impossible-to-compare seven-point LIkert scale questions

I can’t make any sense of the data when it’s presented as a bunch of bars, but when I use divergent stacked bars it becomes very easy to compare and sort the results.

Combined values for seven-point Likert scale questions

Combined values for seven-point Likert scale questions

Recommendations to Utah State University

  1. Continue to make these results public, but make the results usable.  You can do this by…
  2. Reshaping the data to make it much easier to manage in Tableau (see http://www.datarevelations.com/using-tableau-to-visualize-survey-data-part-1.html).
  3. Using divergent stacked bar charts to display Likert scale data.

Click HERE to see interactive dashboard.

Jul 112013
 

I love Tableau and I articulate that love through consulting, training, evangelizing, and blogging.

But there are some things about the product that just drive me nuts.

Here’s a flaw that’s been in the software since at least version 2.0 that I know has tripped up EVERYONE that uses Tableau.

And just what is this problem?

This incredibly intelligent product with so much built-in smarts becomes positively brain-dead when you click File | Save as.  Specifically, when you attempt to save a workbook under a new name, Tableau saves the file into whatever fold from which you just opened a file, not the folder where the previous version of the file was saved.  Let me illustrate.

Let’s say I have two customers, Coke and Pepsi, and I’m currently working with a workbook called SodaWorkbook_Coke.twbx that is saved in the Coke folder.

Coke1

A file saved in the Coke folder

Now let’s say I want to look at something that is in another workbook and that workbook is in a different folder, so I open that workbook from the other folder, in this case the Pepsi folder.

Open a file from a different folder, in this case the Pepsi folder

Open a file from a different folder, in this case the Pepsi folder

Now let’s go back to the first file and perform some low-tech version control; specifically, saving the file under the name SodaWorkbook_Coke_B.twbx.

SaveAs_Oops

Doh! The file gets saved into the wrong folder!

Unless I override Tableau, Tableau will save the Coke file to the Pepsi folder.

Uggh…

As you may have gathered, I’ve been using Tableau for a long time and have gotten used to this “ill-behaved-Windows-application” anomaly.  There have been occasions, however, that I’ve forgotten about this “gotcha” and I now have random files littered across myriad folders because when it comes to saving files I had expected Tableau to behave like EVERY OTHER WINDOWS APPLICATION ON THE PLANET !

(Oops, caps lock problem… my bad.)

This issue seems so fundamental.  Why the delay in fixing it?

Want to see this fixed?  Chime in at http://community.tableausoftware.com/ideas/2036

 

 

 

 

 

 Posted by on July 11, 2013 1) General Discussions, Blog Tagged with: , ,  2 Responses »
Jun 172013
 

Overview

Today I’m going to write about four viz types that have solved a lot of problems for me but that I don’t see out in the wild very often.  I really like each of these approaches as they are analytically sharp and pack an emotional wallop.

Note: a collection of dashboards illustrating each of these types may be found at the end of this post.

 

1) Highlight Table

This one always resonates well with students in my Tableau classes.  Consider this Superstore sales cross-tab that shows profit broken down by sub-category and region.  Can you easily discern which combination of sub-category and region are performing well and which are performing poorly?

A visually impoverished spreadsheet

Now let’s see what happens if we take the same data but present it in a highlight table.

With a highlight table the high and low performers really stand out

With this you can tell “from the back of the room” that Office Machines in the South are doing great and that Tables in the East or doing poorly.  Indeed, Tables are doing poorly across all four regions.

In addition to its clarity I like this approach because it’s so easy to create in Tableau and that is serves as a type of “gateway” visualization for people that are shackled to spreadsheets.  That is, people can still have their raw numbers, but they can really see what’s working and what isn’t.

 

Why Not Use a Heat Map?

I’m the first to recognize that the highlight table just shows one measure (in this case, profit) across multiple dimensions.  A heat map would allow us use size for sales magnitude and color for profitability.

This heat map shows magnitude of sales and degree of profit

Which chart type you use depends on what you want to relate but I think the heat map dilutes the message that is so viscerally clear with the highlight table.

 

2) Jitterplot (aka, “Strip Plot”)

I only recently discovered  and blogged about the the joys of this viz type, and I’m still over the moon with just how many problems this approach solves for me (and I suspect for others).

Consider this profit distribution plot where we place Order ID on the level of detail.

With this distribution plot it’s difficult to see the concentration of overlapping marks

One can surmise from the standard deviation bars that most of the marks cluster in the center, but this visualization really doesn’t “sing”.  Contrast this with the visualization below where we leverage the x-axis and “jitter” the marks left and right of center.

By “jittering” the marks we can really see where the bulk of profits are concentrated

 

3) Bar-in-Bar

I first came across this viz type six years ago when I was using Tableau 2.0.  I was trying to compare two measures against each other and was miffed that I could not control the spacing between the major category groupings; i.e., the clustering between 2012 and 2011 within one category is the same as the spacing between bars across categories, as shown below.

Tableau’s lack of clustering control makes it hard to compare bars pairs

This makes for a tough read.  I complained to Tableau’s “Yoda”, Marc Reuter, and he suggested I create a bar-in-bar chart, like the one shown here.

The Bar-in-Bar chart makes it easy to compare value pairs

The chart is clear, easy-to-read, and it takes up a little less screen real estate.    We do, however, need to display a legend.

 

4) Divergent (or Staggered) Stacked Bar Chart

I’ve written about this viz type quite a bit (see this post and that post) and while this was initially conceived as something that would elegantly handle Likert scale survey questions it is proving to be a great way to see how sentiment / performance skew positive or negative across multiple dimensions.

Going back to our Superstore Sales data, suppose we wanted to see the percentage of orders that took a certain number of days to ship (0 days, 1 day, 2 days … up to 5 or more days).  A traditional stacked bar chart is easy to create, but very difficult to grok.

A collection of 100-percent stacked bar charts is very difficult to read

Compare the chart above with a divergent stacked bar chart where 0 to 2 days is considered good and 3 or more days is considered poor, and where the goal is to have 80% of orders shipped in two or fewer days.

A divergent stacked-bar chart showing all values

I’ve also included an option to just show good / bad in a more binary form.

A divergent stacked bar chart showing just good vs. bad values

I’ve long suspected that survey folks who ask to just see the percentage of respondents that chose the “top two boxes” request this because nobody had presented them with a divergent stacked bar approach.  With this type of chart you can see both the big picture and the details.

Jun 122013
 

Note: This post is dated but contains useful information. I’ve discovered some better ways to handle this but haven’t blogged about it yet. Feel free to nag me and I’ll either write the post or conduct a screen-sharing session.

Overview

Note: If you have not already done so, please review Part 1, Part 2, and Part 2 ½ of this series.  You can also download the source data from here.  The completed interactive dashboards can be found at the bottom of this post.

I continue to get questions about how to handle survey-related issues in Tableau.  The latest inquiry comes from Tony in the UK where Tony asks how to address respondents that are members of multiple organizations / categories.  Here’s an example of the type of “demographic” question that Tony has in mind:

In what countries does your organization maintain operations (check all that apply):

[ ] USA

[  ] Canada

[  ] Japan

[  ] Liechtenstein

etc.

Our goal is to have a dashboard that either looks like this:

Dashboard using action filters

or like this:

Dashboard using quick filters

Here’s how the country data might be encoded by the survey tool where 1 indicates yes and 0 indicates no.

First Approach — One Table

Depending on your willingness to use visual filters (actions) you can treat these Country fields like all other “Question” fields and reshape the data so that it looks like this:

You can now create a visualization that looks like this one:

There are a number of funky things going on here to make this work properly.  Let’s first look at what is on the filter shelf where we are filtering by Question and Value.

The Question filter is set to show just the so-called “Country” questions, as shown here.

Note that UK and USA are at the bottom of the list.  Also note that for the main visualization in the dashboard (the one that shows “Will you vote in the next selection”) the Question filter is set differently.

We also need to set the Response values to 1 so that we get a proper tally for each country as the survey was set up so that 1 = yes and 0 = no.

Level of Detail

You’ll notice that we have ID on the level of detail.  This is because for the main visualization (the one where we ask the question “Will You Vote In The Next Election”) the Question filter is set differently.  We need this level of detail for our action filter so that we can pass the IDs of folks that are in a particular country to the other visualizations on our dashboard.  Note that normally when you add this level of detail you would see a separate bar for every ID.  We get around this by turning Stack Marks off from the Analysis menu.

We also need to use a table calculation (we’ll call itCheckAll_Count) to sum up all the responses across ID:

TOTAL(SUM(Response))

Does This Work?

If you select the country name (so that all marks are selected) the visual filtering will work fine.

If, however, you select the bar itself you run into problems as you end up passing only one ID the other visualizations.

So, is it okay to go into production with this?  That is the reader’s call, but I can pretty much guarantee that if you show this your CEO he / she will click the bar and not the label, so…

 

Second Approach — Two Tables

The advantage of the next approach where we join two tables is that you won’t run into the possible user experience issues cited above and you will be able to use quick filters.  The only downside is that we end up with many more rows of data (but Tableau can handle it like a champ…Tableau Public, on the other hand, balked at allowing me to upload a workbook with more than 100K rows).

In this example I’ve created a second table that just contains the “Country” questions, reshaped.

We then craft the following join in Tableau.

Note that we could also have joined the main table (Reshaped) to itself to achieve similar results.

We now have Country data reshaped and separate from Question data, and we can filter by Country.

We can also craft a visualization using action filters without fear that somebody will select a bar rather than the entire category.

Summary and What’s Next

We now have a way to handle “OR” logic pretty easily and can handle queries like “show responses from people with operations in China, or Japan, or France”.

But what about “AND” logic where we only want to see responses from people with operations in China AND Japan AND France?

If there’s enough interest I’ll write another post.

 Posted by on June 12, 2013 2) Visualizing Survey Data, Blog 21 Responses »
Apr 302013
 

Overview

I was stuck earlier this month trying to cajole Tableau into doing something I needed it to do so I contacted my friend Joe Mako.  When it comes to Tableau, Joe is the “guru’s guru.” (Joe was the person that showed me how to create filled maps in Tableau before Tableau had native support for this. See http://www.datarevelations.com/tracking-stds-hiv-and-aids-in-texas.)

Joe did in fact have a very slick solution to my problem and I will probably write about in a future post but I would rather focus on a broader issue that came about when Joe commented on a visualization I had on my screen.

Vertical Scatterplot

Consider the image below which readers may recall from a post I did awhile back on getting people to care about your viz.

The size of the circle corresponds to number of respondents reporting a salary close to the amount shown.

The red circle shows “your” salary and the other circles show the salaries of everyone else that responded to the survey. The size of the circle indicates the number of respondents that reported earning a particular salary.

There are a number of problems that I had with this approach, the biggest being that I had to group / bin salary amounts so that similar amounts would yield bigger circles.  That is, I would run into troubles with salary amounts like these…

$50,150

$50,200

$49,750

… as they would yield three separate small circles instead of one larger circle.

Here’s what the visualization looks like if you plot a circle for each response ID and don’t size the circles based on number of occurences of a particular salary value.

A vertical scatterplot with too many dots.

Our problem is that we cannot glean the clustering as we have so many marks that are stacked in a single column.

Cue the Violins

Joe, who in addition to Tableau expertise is a font of generalized visualization knowledge, asked if I had ever heard of a violin plot (I had not). He then pointed me to this blog post.

In addition to the violin plot, the post discussed “jittering” marks so that you spread dots both horizontally and vertically, like this:

“Jittering” the scatterplot

Joe pointed out that producing this jitter effect was very simple in Tableau.  You just need to create an x-y chart where the y-axis contains the salary for each respondent and the x-axis displays the index value (the row number) for the particular response.  Interestingly, it is because there is no relationship between the response ID and the salary value that the INDEX() function essentially randomizes the responses and scatters marks across the x-axis.  If you were to sort the IDs by salary you would get an interesting chart, but one that makes the clustering harder to see.

You say potato and I say “Pareto”

Creating the Visualization

The screen shot below shows the main components that go into the visualization.  We have placed the INDEX() function on the Columns shelf and AVG(Salary) on the Rows shelf (note that it will work fine with SUM or even without an aggregation).

Notice that we are coloring by Gender and that ID is on the Level of Detail

Note that ID is on the level of detail.  This is what produces a separate circle for each salary respondent.  We also Compute by ID in the Index() table calculation.

Compute using ID

The only thing left to do is resize the visualization so that it is very narrow.

Here’s the Tableau dashboard showing both the “jittered” scatterplot and your salary as a separate dot.

(I will leave it as an exercise for the reader to download and see how to display the dot.)

Apr 072013
 

Bob Dylan – folk hero to thousands if not millions  – caused a furor when he appeared at the 1965 Newport Folk Festival with … an electric guitar!

If you read about the incident you’ll discover that there was a mammoth sense of betrayal within the folk-centric fold.  How could their hero embrace rock music?

I thought about this musical misstep / milestone when I first read Stephen Few’s rant over Tableau “veering from the path” for allowing two unworthy visualization types and one unworthy visualization implementation to sully Tableau’s latest release (see http://www.perceptualedge.com/blog/?p=1532).

Ironically, I was saddened and disappointed — betrayed is too strong a word — by Few himself over his recent dashboard design competition (see http://www.datarevelations.com/stephen-fews-dashboard-design-competition.html).  But let’s not dwell on that just now.

With respect to Tableau 8, While Few acknowledges that “this version of the software includes many worthwhile and well-designed features” he maintains that Tableau’s introduction of visualization types that are “analytically impoverished” is an indication that the company’s “vision has become blurred.”

This is a grossly unfair assessment as while there may be some aspects to the release that leave me shaking my head, the vast majority of features show crystal-clear vision and laser-guided direction.

Indeed, as someone who uses Tableau every day of every week, I think version 8 is a godsend as the productivity improvements for me, my clients, and my students will be huge. Yes, there are some things in the product that are half-baked – and goodness knows we’re not used to seeing anything half-baked from Tableau.  But for Few to write 6,000-plus words condemning the release while barely acknowledging the incredible advancements seems grossly unjustified.

So, let’s plug in the 1965 Fender Stratocaster and have a listen, shall we?

Acknowledging that which is Half Baked

I’ll start off by acknowledging some of the things that I think are half-baked:

  • Bubble charts have an algorithm flaw, and size and placement cannot be controlled
  • Forecasting is under-documented and does not inspire confidence
  • Treemaps are flawed
  • Multiple Value (Dropdown) filters needs an “apply” button

Bubble Charts

I have no problem with Tableau including this chart type, even though I don’t know if I will every use this viz type in a production environment.

I might, however, use this to help me get a visceral feel for the data.  That is, I rather like the “gestalt” appreciation I get from looking at a bubble chart.

My two problems with Tableau’s implementation is that it’s too much of a “black box” (i.e., I cannot control size and placement) and that you run into bubble sizing problems if you attempt to display both very large and very small values.

Consider the visualization below that I created for my recent “Infographics Behaving Badly” post.

A visually-compelling, but analytically-flawed bubble chart.

As Joe Mako noted, the bubble for The Diary of Anne Frank is the same size as the bubble for The Lord of The Rings even though sales of the latter are almost four times greater than the former.

Apparently, the smallest bubble Tableau will draw is 1/25th the size of the largest bubble.  Rumor has it that this shortcoming will be addressed in a forthcoming bug fix release.

Forecasting

I’ve spent several hours experimenting with this feature and I’ve come to the conclusion that I’m better off creating the forecast using an algorithm that I can control.

Don’t’ get me wrong, I would love to be wrong about this and find out that this feature is deep and rich, but based on my experience it smacks of first iteration, “good-enough-to-get-a-check-mark-on-a-comparison-chart” quality.

Default forecasting results do not inspire a lot of confidence.

Results when you take into consideration trend and season.

Treemaps

Few’s comments on the shortcomings in Treemaps are spot on, I’m just not terribly upset about it as we never had treemaps before.  While the implementation is flawed, it’s still useful.

But yes, I hope Tableau makes this better down the road, as per Few’s recommendations.

Multiple Value (Dropdown) Quick Filter

I’ve wanted this feature for several years as the standard multi-select filters take up A LOT of screen real estate, as shown here.

A check-all-that-apply quick filter.

So, I’m delighted that this functionality can now be neatly packed in a compact dropdown list box.

A compact check-all-that-apply quick filter.

One problem still persists with check-all-that-apply filters and that is Tableau’s insistence on redrawing the visualization after every click.  For some projects it can take several seconds for Tableau to re-render the viz.  Users lose patience with this type of behavior.

I believe that Tableau did have an Apply button in the works that would have addressed this problem but they ran into some stability issues and elected to postpone implementing this feature.

I hope to see it soon.

What About Word Clouds?

I don’t mind that Tableau gives people a way to create these things even though I think they are an analytically-flawed way to present information (although they can pack more of an emotional wallop than a bar chart).

A major problem with word clouds occurs when your data contains different terms that describe the same or similar sentiment.  Consider the word cloud shown below that shows survey responses to the question “what is your mood right now?”

Are the majority of people happy?

One might think that most respondents were happy, but look what happens if we “linguistically normalize” the terms that are synonyms of “sad”:

… or are more people sad than happy?

It turns out that more people are in fact sad.

Note: There are products that are capable of parsing full sentences and are able to “disambiguate” and then normalize terms under umbrella concepts (although I have yet to see the functionality in any word cloud generators).

Acknowledging that which is Fully Baked

I could probably write 6,000-plus words on all the new features that wow me in version 8, but I’ll just focus on five that will allow me to produce better work faster:

  • Applying filters to selected sheets (this is just brilliant)
  • Enhanced set functionality
  • Floating / free-form dashboard elements
  • Enhanced marks card (and in particular multiple text entries)
  • Improved data blending
  • Bonus item – Tableau’s incredible responsiveness during the beta

Applying Filters to Selected Sheets

I’ve been pining for this since version 4 and while it has taken Tableau more years than I would have liked to see this realized, the implementation is beautifully rendered.

Tableau exceeded my expectations here as in making my case for this feature I just wanted to see the following three filter options available to me:

  • All worksheets in the workbook
  • Just this worksheet
  • All worksheets in this dashboard

But as with so many other beautifully-crafted features in the product (including the “add reference lines” dialog box, which one needs to implement Few’s own bullet charts), Tableau developed a more generalized and elegant approach for controlling filter scope, as seen in the following menu sequence and dialog box.

Start by indicating you want to control the scope…

 

… then apply the filter to selected sheets.

Do you hear that?  That’s a choir of imaginary angels singing “ahhhhhh”.

Enhanced Set functionality

The new IN / OUT set functionality is a huge addition and the ability to combine sets is beautifully rendered as shown in this dialog box.

Holy Venn diagram, Batman!

Work like this is hardly an indication of blurred vision.

Floating / free-form dashboard elements

With past versions of Tableau I’ve spent a lot of time fighting with Tableau’s dashboard layout constraints.  Indeed, I would spend hours sparring with Tableau to place visualizations, quick filters, legends, and so on, into a too-cramped-for-all-the-elements-I-want space.

With the latest release there will be a lot less fighting as any and all objects can now be floating elements, so I can easily place objects on top of other objects.

While this may not seem like a big deal, the ability to place legends and filters atop a visualization (vs. locking these items into a designated corner) makes for more efficient use of space and a much slicker looking dashboard.

In this example, floating elements buy me 80 to 100 pixels.

Here’s how I would have presented this in the previous release:

Having to put filters and legends in a designated area means less room for the visualization itself.

Enhanced marks card (and in particular multiple text entries)

I never had a problem with tableau’s “shelf” concept for controlling text, color, size, and so on, but the new “button” concept and attendant marks card implementation are well-designed and will make my life easier, both as a developer and as someone that trains others.

Tableau’s new marks card.

But there’s more to this than just a slicker user interface.  By moving away from the one-item-at-a-time-on-a-shelf approach you can now have multiple items controlling facets of the visualization.  Having multiple text items in play is particularly useful, as shown here.

It’s now a snap to display both count and percentage by placing multiple text elements on the text marks card. I’m also using the floating elements feature to put the title and explanatory text within the viz itself.

Improved data blending

There are a handful of technologies that never cease to amaze me.

WiFi certainly falls into this category.  Even though I’ve used it thousands of times, I’m always enthralled that I can be sitting in a coffee shop, airport lounge, or family room and I can connect to the Internet.

I have the same reaction to trade show pop-up display booths.  I’ve set these things up dozens of times and I’m blown away every time I see the little compact frame expand to ten times its packed size.

I have the same reaction tor Tableau’s data blending capability.  That I can easily – and I mean really easily – get data from one source (e.g., SQL server) to play nicely with data from another source (e.g., Microsoft Excel) without having to think very hard never fails to amaze me.

There had been a major shortcoming in previous releases and that was that the field that linked the two sources had to “be in play”; i.e., either the field was visible or it had to be placed on the level of detail shelf.

This is no longer the case with Tableau 8 so this capability that was so amazing in previous versions is now even more amazing.

Bonus item – Tableau’s incredible responsiveness during the beta

I’ll confess that I thought the various beta builds for version 8 were quite buggy – significantly buggier than with previous versions of Tableau. To be fair, betas from previous releases were insanely stable and beta builds in V8 were no buggier than betas from companies like Microsoft.

Still, having worked with betas going back to 2006, I wasn’t used to stuff not working right.

But I never had much time to worry as Tableaus responsiveness to my bug reports allayed all my fears.  Indeed, their rapid response and genuine concern for my concerns showed great customer focus.

Particular praise should go to Francois Ajenstat whose attentiveness was second-to-none. Our community is lucky to have him as such a stalwart user advocate.

Parting thoughts

While I disagree with the one-sidedness of Few’s critique, I’m profoundly grateful that he did express his dismay as  given his reputation – well deserved, I might add – I suspect we’ll see Tableau attend to the itemized shortcomings sooner rather than later.

Also, it is posts like Few’s – and the attendant replies and follow-up posts, including this one – that produce better products and services.  Indeed, it is through this open discussion that we spread our collective knowledge and expertise, and improve the state of the art.

Let’s keep the passion going.

 

 Posted by on April 7, 2013 1) General Discussions, Blog 1 Response »
Mar 052013
 

My problem with most infographics is that they sacrifice accuracy and clarity for whimsy and cuteness. While I understand the desire to “draw the reader” in, I believe it’s critical that the information and the story not be misleading.

So, imagine my delight when I thought I had found an infographic that was spot-on accurate and fun and engaging.

Last month a friend had posted a link to a Huffington Post article about the Ten Most Read Books in the World.  This article contained Jared Fanning’s clever  infographic.

Wow, I thought, this is fun, clever, and clear.

But then I saw that the zero value for the Y-axis was in the middle of the chart and realized that the graphic was very misleading.  If you don’t look carefully you would think that readership of The Holy Bible is a little more than twice that of The Diary of Anne Frank.  If, however, you hide the clever part of the graphic and have the y-axis start at zero, you see a much more accurate interpretation of the data.

So, how would I display the data?

If I did not feel pressure to be mirthful I would go with something like this (rendered using Tableau 8 in about five minutes):

If I felt compelled to add some eye candy I might try something like this:

Then I would spend around three hours trying to make the book icons easier to read.

By the way, I’m the first to admit that this approach is not nearly as much “fun” as the first infographic.

But this graphic is accurate and clear, and that has to come first.

Note: One of the problems with the data itself is that The Holy Bible so dwarfs most of the other books.  I did experiment with a Bubble chart (see below) but didn’t want to spent valuable time getting all the book icons to be “just so.”