Jan 092017
 

By Steve Wexler and Jeffrey Shaffer

January 9, 2017

Please also see follow-up post.

Overview

Makeover Monday, started by Andy Kriebel in 2009 and turned into a weekly social data project by Kriebel and Andy Cotgreave in 2016, is now one of the biggest community endeavors in data visualization. By the end of 2016 there were over 3,000 submissions and 2017 began with record-breaking numbers, with over 100 makeovers in the first week. We are big fans of this project and it’s because of the project’s tremendous success and our love and respect for the two Andys (and now Eva Murray) that we feel compelled to write this post.

Unfortunately, 2017 started off with a truly grand fiasco as over 100 people published findings that cannot be substantiated. In just a few days the MM community has done a lot damage (and if it doesn’t act quickly it will do even more damage.)

What happened

Woah!  That’s quite an indictment. What happened, exactly?

Here’s the article that inspired the Makeover Monday assignment.

So, what’s the problem?

The claims in the article are wrong.  Really, really wrong.

And now, thanks to over 100 well-meaning people, instead of one website that got it really, really wrong there are over 100 tweets, blog posts, and web pages that got it really, really wrong.

It appears that Makeover Monday participants assumed the following about the data and the headline:

  • The data is cited by Makeover Monday so it must be good data.
  • The data comes from the Australian Government so it must be good data that is appropriate for the analysis in question.
  • The headline comes from what appears to be a reputable source, so it must be true.

Some Caveats

Before continuing we want to acknowledge that there is a wage gap in Australia; it just isn’t nearly as pronounced as this article and the makeovers suggest.

The data also looks highly reputable; it’s just not appropriate data for making a useful comparison on wages.

Also, we did not look at all 100+ makeovers. But of the 40 that we did review all of them parroted the findings of the source article.

Some makeover examples

Here are some examples from the 100+ people that created dashboards.

Figure 2 -- A beautiful viz that almost certainly makes bogus claims. Source: https://public.tableau.com/profile/publish/Australias50highestpayingjobsarepayingmensignificantlymore

Figure 2 — A beautiful viz that almost certainly makes bogus claims. Source: https://public.tableau.com/profile/publish/Australias50highestpayingjobsarepayingmensignificantlymore

example2

Figure 3– Another beautiful viz that almost certainly makes bogus claims.  Source: https://public.tableau.com/profile/publish/MM12017/Dashboard1#!/publish-confirm

example3

Figure 4 — A third beautiful viz that almost certainly makes bogus claims.  Source: https://public.tableau.com/profile/publish/AustraliaPayGap_0/Dashboard1#!/publish-confirm

example4

Figure 5 — Yet another beautiful viz that almost certainly makes bogus claims.  Source: https://public.tableau.com/views/GenderDisparityinAustralia/GenderInequality?:embed=y&:display_count=yes&:showVizHome=no#1

Goodness! These dashboards (and the dozens of others that we’ve reviewed) are highlighting a horrible injustice!

[we’re being sarcastic]

Let’s hold off before joining a protest march.

Why these makeovers are wrong

Step back and think for a minute. Over 100 people created a visualization on the gender wage gap and of the dashboards we reviewed, they all visualized, in some form, the difference between male Ophthalmologists earning $552,947 and females that only earned $217,242 (this is the largest gap in the data set.)

Did any of these people ask “Can this be right?”

This should be setting off alarm bells!

There are two BIG factors that make the data we have unusable.

One — The data is based on averages, and without knowing the distributions there’s no way to determine if the data provides an accurate representation.

Here’s a tongue-in-cheek graphic that underscores why averages may not be suited for our comparison.

problems-with-averages

Figure 6 — The danger of using averages.  From Why Not to Trust Statistics.

Here’s another real-world graphic from Ben Jones that compares the salaries of Seattle Seahawks football players.

benjones_salaries

Figure 7 — Seattle Seahawks salary distributions. Source: Ben Jones.

Ben points out

The “average” Seahawks salary this year is $2.8M. If you asked the players on the team whether it’s typical for one of them to make around $3M, they’d say “Hell No!”

Two — The data doesn’t consider part time vs. full time work. The data is from tax returns and doesn’t take into account the number of hours worked.

Let’s see how these two factors work with a “for instance” from the source data.

Figure 8 -- A snippet of the source data in question.

Figure 8 — A snippet of the source data in question.

So, there are 143 women Ophthalmologists making an average of $217K and 423 males making an average of $552K.

Are the women in fact being paid way less?  On average, yes, but suppose the following were the case:

Of the 143 women, 51 work only 25 hours per week

And of those 423 men, 14 of them are making crazy high wages (e.g., one of them is on retainer with the Sultan of Brunei).

Could the 51 part-time workers and the 14 insanely-paid workers exaggerate the gap?

Absolutely.

Is this scenario likely?

About the Sultan of Brunei?  Who knows, but about hours worked?

Very likely.

We did some digging and discovered that as of 2010, 17% of the male workforce in Australia was working part time while 46% of the female workforce was working part time.

This single factor could explain the gap in its entirety.

Note: Not knowing the number of hours worked is only one problem. The data also doesn’t address years of experience, tenure, location, or education, all of which may contribute to the gap.

Findings from other surveys

We did some more digging…

Data from the Workplace Gender Equality Agency (an Australian Government statutory agency) shows that in the Health Care field, 85% of the part-time workers in 2016 were female. This same report shows a 15% pay gap for full-time Health Care employees and only a 1% gap for part-time employees.

Finally, a comprehensive study titled Differences in practice and personal profiles between male and female ophthalmologists, was published in 2005. Key findings from this survey of 254 respondents show:

  • 41% of females worked 40 hours per week compared with 70% for males.
  • 57.5% of females worked part-time compared with 13.6% for males.
  • The average income for females was AUS$ 38,000 less than males, not $335,000 less.
    (Yes, that’s still a big gap, but it’s almost 10 times less than what the article claims).

Why this causes so much damage

It would keep me up at night to think that something I did would lead to somebody saying this:

“Wait!  You think the wage gap here is bad; you should see what it’s like in Australia.  Just the other day I was looking at this really cool infographic…”

So, here we are spreading misinformation. And it appears we did it over 100 times! The visualizations have now been favorited over 500 times, retweeted, and one was featured as the first Tableau Viz of the Day for 2017.

We’re supposed to be the good guys, people that cry foul when we see things like this:

Figure 9 -- Notorious Fox News Misleading Graphic.

Figure 9 — Notorious Fox News Misleading Graphic.

Publishing bogus findings undermines our credibility. It suggests we value style over substance, that we don’t know enough to relentlessly question our data sources, and that we don’t understand when averages work and when they don’t.

It may also make people question everything we publish from now on.

And it desensitizes us to the actual numbers.

Let us explain. There is clearly a gender wage gap in Australia. The Australian government reports the gender wage gap based on total compensation to be around 26% for all industries, 23% for full-time and 15% for full-time health care (base pay is a smaller gap). While we can’t calculate the exact difference for full-time or part-time ophthalmologists (because we only have survey data from 2005), it appears to be less than 15%.

Whatever the number is, it’s far less than the 150% wage gap shown on all the makeovers we reviewed.

And because we’ve reported crazy large amounts, when we see the actual amount — say 15% — instead of protesting a legitimate injustice, people will just shrug because 15% now seems so small.

How to fix this

This is not the first time in MM’s history that questionable data and the lack of proper interrogation has produced erroneous results (see here and here.) The difference is that this time we have more than 100 people publishing what is in fact really, really wrong.

So, how do we, the community, fix this?

  • If you published a dashboard, you should seriously consider publishing a retraction. Many of you have lots of followers, and that’s great. Now tell these followers about this so they don’t spread the misinformation. We suggest adding a prominent disclaimer on your visualization.
  • The good folks at MM recommend that participants should spend no more than one hour working on makeovers. While this is a practical recommendation, you must realize that good work, accurate work, work you can trust, can take much more than one hour. One hour is rarely enough time to vet the data, let alone craft an accurate analysis.
  • Don’t assume that just because Andy and Eva published the data (and shared a headline that too many people mimicked without thinking) that everything about the data and headline is fine and dandy. Specifically:
  • Never trust the data! You should question is ruthlessly:
    • What is the source?
    • Do you trust the source? The source probably isn’t trying to deceive you, but the data presented may not be right for the analysis you wish to conduct.
    • What does the data look like? Is it raw data or aggregations? Is it normalized?
    • If it’s survey data, or a data sample, is it representative of the population? Is the sample size large enough?
    • Does the data pass a reasonableness test?
    • Do not trust somebody else’s conclusions without analyzing their argument.

Remember, the responsibility of the data integrity does not rest solely with the creator or provider of the data. The person performing the analysis needs to take great care in whatever he / she presents.

Alberto Cairo may have expressed it best:

Unfortunately, it is very easy just to get the data and visualize it. I have fallen victim of that drive myself, many times. What is the solution? Avoid designing the graphic. Think about the data first. That’s it.

We realize that the primary purpose of the Makeover Monday project is for the community to learn and we acknowledge that this can be done without verified data. As an example, people are learning Tableau everyday using the Superstore data, data that serves no real-world purpose. However, the community must realize that the MM data sets are real-world data sets, not fake data. If you build stories using incorrect data and faulty assumptions then you contribute to the spread of misinformation

Don’t spread misinformation.

Jeffrey A. Shaffer
Follow on Twitter @HighVizAbility

Steve Wexler
Follow on Twitter @VizBizWiz

Additional reading

Why not trust statistics. Read this to see why the wrong statistic applied the wrong way makes you just plain wrong (thank you, Troy Magennis).

Simpson’s Paradox and UC Berkeley Gender Bias

The Truthful Art by Alberto Cairo.  If everyone would just read this we wouldn’t have to issue mass retractions (you are going to publish a retraction, aren’t you?)

Avoiding Data Pitfalls by Ben Jones. Not yet available, but this looks like a “must read” when it comes out.

Sources:

1. Trend in Hours worked from Australian Labour Market Statistics, Oct 2010.

http://www.abs.gov.au/ausstats/abs@.nsf/featurearticlesbytitle/67AB5016DD143FA6CA2578680014A9D9?OpenDocument

2. Workplace Gender Equality Agency Data Explorer

http://data.wgea.gov.au/industries/1

3. Differences in practice and personal profiles between male and female ophthalmologists, Danesh-Meyer HV1, Deva NC, Ku JY, Carroll SC, Tan YW, Gamble G, 2007.

https://www.ncbi.nlm.nih.gov/pubmed/17539782?dopt=Citation

4. Gender Equity Insights 2016: Inside Australia’s Gender Pay Gap, WGEA Gender Equity Series, 2016.

http://business.curtin.edu.au/wp-content/uploads/sites/5/2016/03/bcec-wgea-gender-pay-equity-insights-report.pdf

5. Will the real gender pay gap please stand up, Rebecca Cassells, 2016.

http://theconversation.com/will-the-real-gender-pay-gap-please-stand-up-64588

[suffusion-the-author]

[suffusion-the-author display='description']

  26 Responses to “What to do when so many people get it wrong”

Comments (25) Pingbacks (1)
  1. Hi Steve/Jeff

    Guilty as charged. I’m a Makeover Monday regular, in fact you can’t get more regular than me, having completed all 54 out of 54 exercises so far, including (obviously!) the Australia wages makeover above (though I’m not one of the featured visualisations in your article). To me, rightly or wrongly, that’s just how I’ve treated each week of Makeover Monday: as 54 Tableau exercises, nothing more. A year ago I was very much a beginner, and now, as a result of many of the Tableau visualisations I’ve worked on and published, my skill, enthusiasm and profile as a Tableau user have snowballed. I have a lot to thank the project for and the support and feedback of the community that has grown around it, so I’ll try not to let my fondness for the project, and the gratitude I owe it to my own development, cloud my judgement.

    Thank you for bringing the above data issues to our attention. I did at least vaguely think that some of the anomalies must be down to the use of average figures. I know of the “average wage” parody cartoon above. I know enough about working patterns, hours, and cultures to know that all may not quite be as it seems, but the overall headline was good, was backed up by the (albeit dubious) data, and had been published by the Australian government. They’ve done the thinking for me, published the data and published their findings in their name, haven’t they? But I, for one, should in theory know better. Prior to my Tableau obsession, I have over 20 years’ experience in data analysis and before that, completed a statistics-heavy Maths degree. My thanks above are genuine – the blog post is thorough, accurate, well-meaning, non-inflammatory and constructive.

    So I hope I’ve got enough data background and integrity to admit that my heart sank on reading this article, not least because I understand and agree with it 100%. I think there are two alternative ways to tackle it, both of which are a bit of a shame.

    1. Put a big disclaimer on the MM website, and a clear link to this disclaimer on all of our visualisations. The disclaimer would be something along the lines of “MM is purely an exercise in data visualisation. We are not responsible for the accuracy or validity of the source data”. I admit I usually take all of 30 seconds to put a quick “Source: [top level website” and #MakeoverMonday on each visualisation. Perhaps we should make a clearer link to a full disclaimer. I don’t particularly like that, I like to be cleaner and more minimalist without having to find room for another line of text on every makeover visualisation, but it’s sensible, and right.

    OR

    2. We do the same amount of due diligence on the data as has been done above. I think this is what the best data journalists should do, and indeed they do. It’s what takes time, and it’s what means that a well researched and analysed piece of work should take days/weeks, not one hour.

    But I don’t think (2) is what Makeover Monday is about. (2) is what we should do if we work for the Australian government, or any organisation which publishes visualisations based on their own sourced and analysed data. It would be a job to undertake with colleagues involved in all stages of the data collection, cleaning, processing, analysis and reporting process. We, the participants, don’t all have the skill, or time, to go to such lengths. Most of us, myself included, have neither. The ethos and implication of Makeover Monday is almost the opposite – a freely available, often socially interesting, supposedly clean dataset presented on a plate to us, to do something visually appealing with in the software of our choice, with the deliberate encouragement *not* to take more than an hour. (Admittedly, I usually take longer, but that’s purely down to my speed as a learner and indecision over how to present the data). If we are required to do (2), even in part, then Makeover Monday in its current, weekly, social, online form is dead in the water, and would need to be considered a more time-intensive major (monthly/quarterly?) project instead. It’s not the wrong thing to do, it just doesn’t fit in with the MM process.

    The question is, is (1) enough? I hope so. I think too much hard work and goodwill is tied up in MM for it to be derailed or debunked, and more visible clarification of its purely reproductive nature should ensure that it continues to be as successful this year as last. It’s generally known that Makeover Monday just recreates existing charts/visualisations in alternative form, no more than that, but being generally known doesn’t detract from the need to make this clearer. I’ll certainly publish a retraction as it’s important to me to make conclusions based on rigorous data, even if my conscience is clear that it was never my intention as someone practising a new visualisation technique to query the data provided in the first place. And I’ll always welcome debates on this scale, we all learn from them whatever conclusions we draw.

    • Neil,

      I appreciate you taking the time to articulate your thoughts. I’ve seen the work you’ve been doing the past year and I think it’s terrific and you are helping raise the bar in terms of design and presentation.

      I plan to write a follow up post based on feedback from you and others and now have more ideas on how this can work without putting an unrealistic burden on Andy, Eva, and all of the participants.

      This particular MM got my attention because of the confluence of three things:

      1) The mistakes were really large (probably close to an order of magnitude)
      2) So many people amplified this mistake
      3) The subject matter is very important.

      More to come.

      Steve

  2. All very valid points. And those of us who know this should do better. We should set an example for the newer members of the data viz community. And if we don’t have the time or skill-set to properly vet the data, then maybe the default “lazy” option is to just put a disclaimer on every viz just like we add a link to the source data. Caveat emptor.

    But I wonder if perhaps there is an opportunity to incorporate these skill-sets into those that MM aims to teach. I know it’s a fairly “passive” teaching classroom (i.e. we learn primarily by digging in and observing others’ work), but it has clearly resonated with the community as a way to learn or improve data viz skills. Similar to what Andy and Eva do each week by explaining why they chose the viz and what they did to improve it, maybe there should be an opportunity for “guest lecturers” (rotating so as not to place too much time burden on a handful of people) to take apart the data set each week in a similar way. Then, over time, the holistic skillset of the community will improve as they learn via repeated practice and exposure to not only design better vizzes but better asses the data underlying those vizzes.

    Would love to know what you think.

    • Michael,

      Over the years you have consistently produced great work and have contributed some of the best comments to the Data Revelations site. And you continue to do so.

      I think you have some wonderful ideas. Look at the design work we see from MM. Some of this stuff is amazing and MM has definitely helped raise the bar.

      The problem is appears to be at the expense of sound analysis.

      If we could get both — and I think we can, as this is an exceptional community — then we will have made a terrific contribution.

      One thing, though, that may be a non-starter for some and that is the amount of time it takes to produce good work. Sure, there have been occasions where I’ve managed to knock out some terrific stuff in a short amount of time, but more often than not it takes hours to produce good word.

      That said, I do have an idea that I will share on a subsequent post.

      Steve

      • Thanks so much, Steve. It’s a great compliment to know that you pay attention to what I’ve produced.

        I agree that the approach to teaching data assessment / integrity skills won’t necessarily look the same as the approach used in MakeoverMonday to teach data visualization. Speed won’t be a focus, at least not initially, although I do think that as people get more practice in this area, being able to assess and possibly correct a data set won’t take as long as it might take at first. I know that I’ve spent many hours on some MakeoverMonday submissions, simply because I wanted to try something new and that by its nature was going to take more time. But once I did it once, the second time I used that technique didn’t take as long, and it continued to get faster. I, for one, would love to see a similar maturation in my abilities to gauge data integrity. A community-driven platform to learn those skills and see how others apply them would be invaluable.

        • Michael,

          Hmm. Sounds like you just volunteered to run “Find the Flaw Friday”.

          That’s NOT a bad idea, and the results could be used to feed MM.

          Thoughts?

          Steve

          • :-) I suspect I might not be the best person out of the gate, as I see myself definitely on the student side of this one. But I am open to assist in any way I can, as I do agree with you and Jeff on its importance as an attendant skill set to data viz.

            I could envision something akin to the following happening:

            1. Andy and Eva continue to publish data sources as they have.

            2. We add one more recommendation for all participants, namely that they add a caveat about the data next to the source link in their viz. This way they don’t have to address the integrity (and can focus on creating their makeovers) but both they and their audience are made aware of the fact that the data could be skewed, incomplete, etc.

            3. A subset of participants contribute a blog post each week that addresses the data independent of the viz. Questions about completeness, skew, etc can be brought up at that time. If the author is able to, they can produce a “cleaner” data set that would supplement the original. For example, Alan Walker has done this on a few occasions.

            4. Participants can simply read the data integrity blog posts (as they do the makeover blog posts that Andy and Eva publish) and learn simply by being regularly exposed to the kinds of questions that are asked of the data. But they can also choose to re-do their viz using the cleaned data (assuming one can be created) – it would be very interesting to see if the stories change and whether/how those changes impact someone’s design choices.

            I think the above is quite achievable and wouldn’t constrain the current focus of MM – it would simply augment it with a parallel track.

  3. This was my first real MakeoverMonday. I decided to participate in it to start learning about Tableau, how to use the tool, how to design and visualise data with it, how it works, how to get it to do what I want. My intent is to learn the tool but at the same time checking the data and this one has cemented my resolve on that – to treat it as a real exercise and learn the tool at the same time.

    The data provided for this makeover made me feel very uncomfortable from the start and reading the content of some of the submissions made me feel positively nauseous. They looked lovely but I was picturing completely erroneous headlines after the media picked them up – scary indeed.

    I did comment on the validity of the data and some of us had a conversation about it at the time, so, yes, some of us did consider the meaning of data and were careful about drawing conclusions from it – I made reference to this in my vis and in various comments in twitter at the time. Next time I will make sure my comments are more specific and plastered all over the vis :-)

    This has been a great conversation starter and hopefully makes us all think very carefully about the impact our published visuals can have if they are picked up by somebody who doesn’t understand the background – I know it’s been a good exercise for me.

    Oh also, in addition to the problems with the data that you’ve identified, there is also another glaring one, – in the Australian taxation system, ‘taxable income’ is not the same thing as ‘pay’ and to treat them as if they are the same thing is completely wrong. Anybody who doesn’t understand the difference between the two needs to do some research on the ato website to find out about what is deemed to be income, what related expenses are, how taxable income is calculated, how investments, trusts etc impact taxable income, how high income earners tend to invest, and on it goes… we shouldn’t trust that column headings mean what we think they might mean, we should make sure we understand the meaning of the content, the specifics of it, not some hand-wavey generalisation.

    • Peter,

      Thanks for sharing the comments and for participating in MM.

      Jeff and I could not review all 100+ makeovers so I’m glad you made some comments.

      Also, I have made more mistakes in both analysis and design than I care to admit, but I’m doing what I can to make sure others do not make the same mistakes. I can’t speak for Jeff but the reason this MM got my attention was that there were so many people making such large mistakes.

      On a subject that is important.

      Steve

  4. Thank you Steve and Jeffrey for the great coverage on two sensitive topics: responsibility for misinformation and gender pay gaps.

  5. Makeovermonday is a great platform for improving design and visualization skills and I have definitely improved my skills over the weeks.

    Design and visualization skills are crucial but getting the truthful data should be our first priority. And I’m feeling guilty about never thinking about the latter.

    No data is 100% accurate but if we start questioning the authenticity and accuracy of the data, we can do better analysis and draw better conclusions.

    I think it is the sole responsibility of the author for checking the authenticity and accuracy of the data source.
    It’s very difficult to control and manage a project of this magnitude for just 2 persons. I think it will help to put some strict guidelines.

    What we can do before jumping to the visualization part:

    -be skeptical about the data and the source
    -if you find any oddities in the data, bring to the attention of the community
    -be sure of the final conclusions you draw from the data.

    Makeovermonday has been very successful in terms of design and visualization. If we can tackle some of the data truthfulness issues, Makeovermonday can evolve into a more successful and meaningful project.

    Thanks
    Shivaraj

  6. Great post. I’ve tried the idea of a caveat in 2017 makeovers so far, so am keen to hear how you develop your ideas in a way that Andy and Eva could practically incorporate. I like the concept of some participants focussing on the viz and some giving attention to the data. For me a write up on the data should still count as a contribution, even if it doesn’t come with a viz (I do appreciate not everyone would see that as the purpose of the project, but your post adds a heap of value for someone trying to improve like me). The trick might be having a placeholder location on the MM website (that participants can link to in their source / caveat statement perhaps) where such a data write up would end up?

    Minor point: early in your post and in a comment above you make it clear that you didn’t review all 100+ makeovers, but then draw the conclusion that we’ve magnified the problem in the initial article 100+ times. Bad conclusion based on your data set? 😉

    • Steve,

      With respect to a mechanism for giving some attention to the data versus the makeover, a lot of people have been making suggestions as to how to make this happen. I’m not sure it would be easy to add this to MM per se, but maybe a different project that yields “safe” data sets for MM? I’m working on another blog post and have some ideas.

      As for only looking at 40 makeovers and coming to conclusions about all the others, we were pretty careful not to title the post “When *everybody* gets it wrong.” Indeed, I don’t know if *all* the makeovers parroted the same bogus conclusion but with so many submissions I’m confident that over 100 were wrong.

      And I concede that the word “wrong” might be well, wrong. A better term might be “baseless” as there is in fact, a tiny possibility that the conclusions from taking averages that do not take hours worked into account may, by some miracle, yield correct results.

      I’d be happy to place a bet and even give odds.

      Steve

    • I can’t attest for Steve, but I’ve looked at everything I could find on Twitter and what’s posted on the Pintrest board. Might not be 100, but definitely more than 40 at this point. I only found one author who originally posted a note about being skeptical about the data. He pointing out that it’s not a “pay comparison”. Peter Robinson (https://twitter.com/pjrtweets), created his MM in Excel. https://twitter.com/pjrtweets/status/816918983186100225

      • Jeff, Peter made some good comments that you will find above.
        So, the one person we know if that didn’t use Tableau is the one person who expressed skepticism… 😉

        • Haha, Jeff, no, after I raised the ‘taxable income is not really pay’ issue, a few of us chatted separately about the issue of part-time work hours, average/skewed wages etc, some of who are Tableau regular users and me who’s just learning it (having used Excel for decades). You made me laugh :-)

      • You’re probably right so I’ll decline that bet Steve, although I’m sure you were going to offer tempting odds! I do know that there was more than one concern expressed as I did so myself. I vaguely recall there being some other submissions that stuck to average taxable income not pay, but it wouldn’t be many. Anyway I look forward to hearing more in future blog posts. Thanks again.

  7. One point that I was wondering when exploring the dataset was how long were the respondents in their jobs. Folks who have been medical specialists for longer would tend to get paid more, I would think. And the length of time in job may also have a gender skew.

    Usually with new data, I tend to spend more than an hour just doing basic exploration and understanding, let alone producing a finished product. #MakeoverMonday challenges us to try and create interesting viz’s in a shorter time period, but it is a great point you make about trying to do important work in a short amount of time.

    • Brad,

      We make the point that tenure (time on the job) is also a big consideration. Based on other research I’ve seen probably not as big a factor as number of hours worked, but one would certainly expect to pay somebody with 20 years’ experience more than then somebody with one five years’ experience.

      As I read more into your other point, who’s going to want to participate in MM if you have to spend hours just checking out the data.

      I have some ideas about this and will post later this week or next.

      Steve

      • Steve,

        I don’t think that many folks would want to participate if the amount of data vetting needs to greatly increase. I’m looking forward to the next blog post to see your ideas!

  8. Steve, Jeff – thanks for the article and your highlighting of this very important issue (not just for Makeover Monday but as a general issue in data visualisation / analysis).

    I didn’t participate in the Makeover in question, mainly due to work and family commitments, though I have participated in a few other weeks of the project and personally experienced having the dataset and “my” analysis questioned. The data visualisation in question was posted to Reddit and got well over 500k views and many thousands of comments, the vast majority being very negative to the “findings”.

    The thread can be viewed here: https://www.reddit.com/r/dataisbeautiful/comments/5ggb81/almost_2_million_people_migrated_from_mexico_to/?st=ixtbr6vf&sh=195b3393

    The furore was perhaps caused by a potentially inflammatory headline (not from me) and the fact some dubious data was included, e.g. 28 Canadians migrated from Canada to USA in 2005-2010

    This is relevant here because my initial reaction was one that many people who participated in the Australian Paygap Analysis will have had: Horror.

    I checked my data, checked the source Andy had shared and revisited the source, spending many hours pouring over the methodology. What had I missed? You see, like every other Makeover Monday, I’d gone into “viz mode” when creating this viz – automatically stepping over the normal analytical process and simply visualising what was in front of me. I have a feeling many others will do the same thing, hence the fact 50-100 people “got it wrong” last week.

    The bottom line here is the fact that I hadn’t missed anything, my data matched what everyone else used for the project, it also matched the analysis on the raw data from the authors. The source http://www.global-migration.info/ was well researched academically and published in Science. The methodology used perhaps led to interesting numbers, and we can discuss the validity of the results all day, but the result stands the data was “correct”.

    And here’s the problem, because no dataset is perfect. The Australian Pay Data is certainly less perfect than some others but treated in the right way it could still tell a story.

    I know that you aren’t asking for perfection, but I do feel there’s a balance that needs to be struck when spending time finding datasets for analysis. I feel for Andy and Eva in sourcing data as they very much have a thankless task when it comes to selecting data and charts for makeover, particularly with regard to any knee jerk response to the issues you’ve highlighted.

    Personally I feel the response shouldn’t come from Andy and Eva here, but instead the community, and in particular the participants of Makeover Monday, should find some kind of response, perhaps individually, that they feel happy with. Any other response runs the risk of increasing the work, and risk, for Andy and Eva in choosing data (risking the continuation of the project). Furthermore trying to run the project behind closed doors, or with checks and balances, would again limit participation – which has been the stand out success of MakeoverMonday.

    So perhaps individuals may choose to “badge” their submissions as has been suggested, perhaps they’ll audit the data and ensure they’re happy, some people may take it on themselves to “critique” the data in advance on behalf of the community, still others may not have seen, or will ignore, your post.

    Some certainties though: (1) your intelligent and considered post has highlighted an issue that needed spotting and discussing. (2) the community has learnt a great deal from it and will be better informed in future (3) it won’t be the last time Makeover Monday has a “flawed” dataset or visualisation.

    Personally I feel without these mistakes we’d all be worse off, so I welcome the opportunity to learn from them.

    Thanks again for the discussion, Chris.

    • Chris,

      In addition the truly stellar work you do I greatly appreciate your very thoughtful comments.

      Jeff and I have received a lot of great feedback on how the community can help Andy and Eva address this as I really want to see MM thrive because it has helped so many people get so much better.

      I will do my best to synthesize the feedback into a blog post and some concrete things we can do.

      Steve

    • Chris,

      Thanks for taking the time to respond here. You nailed a few points here.

      1.) There’s no way Andy and Eva (or previous Andy) could or should be responsible. There may be some easy things they can do help, but I hope we made it clear that we don’t think it’s on them to fix this.
      2.) These issues will come up with any data set, so hopefully what we’ve outlined will encourage people not to just download the data and viz away, but to think critically about it.
      3.) As you pointed out in your previous example, I have no reason to believe this data is wrong. We verified the source and things look good (other than a few mistakes on the WomenAgenda list on their website). The data was government data, things matched up and all indications are that the data is correct. The problem in this case is that this data, while correct, can’t be used to make the comparisons and the claims that Business Insider and the Women Agenda website made and the subsequent MM submissions.

      So that’s a great thing to point out here. It’s not just verifying that the source is good and the data is correct. It’s the critical thinking part of how that data fits into the analysis.

      A Makeover Monday logo is an interesting idea and maybe that links to a disclaimer page that Andy and Eva can set up.

      Whatever the solution, we definitely need to make sure that the MM community can continue to grow and thrive.

      • agreed, it’s the critical thinking part that’s key. Andy and Eva (and formerly Andy too) aren’t responsible for that. That data is what it is. What we make of it is up to us and that takes some thought. At first I thought it would be good to ensure the data is clean and guaranteed accurate, but how often does that happen in the real world. It’s an issue that we all have to deal with all the time. Thinking about it as part of MM is a good thing imo.

 Leave a Reply

(required)

(required)