Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Graphics That Seem Clear Can Easily Be Misread (scientificamerican.com)
85 points by adunk on Oct 1, 2019 | hide | past | favorite | 31 comments


This effect has a name, The Simpsons paradox [0]

The graph in [1] explains it much better than any words could.

[0] https://en.wikipedia.org/wiki/Simpson%27s_paradox [1] https://www.analyticsindiamag.com/understanding-simpsons-par...


> This effect has a name, The Simpsons paradox [0]

No relation to The Simpsons; it is Simpson's paradox, after Edward Simpson (e.g., https://ftp.cs.ucla.edu/pub/stat_ser/r414.pdf).


https://imgur.com/a/lg3gwPJ

Not sure about the ad placement in that analyticsindiamag.com article.


Not quite; Simpson's paradox is when splitting a group into subgroups reverses the direction of correlation in every subgroup of the split, which is not the case in the example presented here.

Aside: it's not always true that the direction of correlation in the sliced groups is more correct than that of the larger group. You need causal analysis to know when it is incorrect or correct to slice the data to eliminate confounders.


That's part of it, but drawing a linear fit through the data shown in the article is outrageously misleading in itself, and should be rejected outright.


There's nothing inherently wrong with a linear fit and a linear model is often more accurate than a more ostensibly correct-looking polynomial or complex function which overfits the observed data.

Perhaps you mean that drawing any kind of fit though this data is misleading, when lacking confidence statistics and R2 score - although you could have all those and still do poorly on held-out data, or you could even do well on held-out data and still be victim to confounders or misunderstanding the direction of causation...

I see a line through data as nothing more than a hypothesis, which needs to be backed up with rigor. If it's backed up, there's nothing outrageous about it.


The residuals of this model look really bad though, and that should tell you that there's insufficient predictive value. I'd at a minimum use a piecewise fir, or simply not fit a model without further predictors.


Seems to be the statistics equivalent of gerrymandering.


One interesting aspect to Simpson's paradox it that it's not straightforward which perspective is the more valid one, by groups or combined.

One example in which the combined perspective is the valid one is election results (it is possible to win a higher percentage of votes in multiple areas, yet lose the overall vote).


Gerrymandering is an intentional act to achieve specific outcome.

Simpson's Paradox can strike even the most committed truth-seeker trying to understand and interpret the available data.

Edit to add: Simpson's Paradox could conceivably thwart a gerrymanderer if they have an erroneous model relating demographics to voting patterns.


This is all true.. However what I meant is that the way gerrymandering can indicate a winner that is not the majority choice, is quasi analogous to how the Simpson's paradox exemplifies a weighed average that disagrees with the subtotals of the data.


Oh, I see—you mean from the perspective of the voter/pundit/analyst/politician trying to understand what an election means and their mandate for action. Sort of like the debate over electoral vs popular vote.

I hadn’t thought about it that way or seen the word gerrymandering used in that way, but it makes sense.


Talk about easily misreading a graph... I thought the article in the second link was going to explain the decline of the Simpsons based on frequency of episodes focusing on specific characters, but it was just trying to be clever with that image.


That was a fascinating read, thanks!


The title alone made me think of a different phenomenon that occurs to me.

Its to do with websites. Lets say there a website where one downloads some free software from, for example. If there's a really big bright clear download link, Ive been taught to not click on it, as its probably going to an ad or some scam site. Ive learned to always ignore it and search for the tiny little text hyperlink that actually leads to the real download file.

Ive noticed this trained behavior of mine now applies to all websites, even when the big link is actually the real one. I will often miss it, and waste time search for the 'real' one as my brain now automatically filters the big clear ones out.


The name for this effect is Banner Blindness:

https://en.wikipedia.org/wiki/Banner_blindness


If you take a statistics class, one of the very first things that are taught is to be careful not to infer a wrong cause-effect relation when presented with a "correlation".

A typical example is "given any city, there is a strong correlation between number of churches and number of crimes commited." This is pretty much true everywhere in the world but that does not imply one is causing the other. This correlation can simply be the natural outcome of more populous cities having both in higher numbers compared to less populated ones.


I have found it is pretty common for people to use simplified justifications, such as cause and effect, to support their conclusions on a subject. If the relationship between the cause and the effect is not clearly stated it is very common that they may not be properly reversed during backwards analysis from an end point to a start point. In that case the qualifying behavior is a form of cognitive conservatism demonstrated through a selective bias.

While that form of thinking may sound incredibly stupid, example: how could a person confuse a cause for its effect, it is exceedingly common. I have seen incredibly smart people make this mistake. The mistake is the non-cognitive behavior at play that unduly influences what is otherwise a very logical and straight forward conclusion. Objectivity is a practicable personality trait not aligned to logic or math skills.

https://en.wikipedia.org/wiki/Cart_before_the_horse


You know, I've never found a rigorous definition of what "causality" actually is.

Like, I know what "correlation" is: slap a regression on A and B and see what comes out (after considering heteroscedasticity and friends). But, for causation, how does one find it?

Is causation even a well defined concept? In most disaster analysis situations you see that failure wasn't caused by a single factor but it came as a result of a combination of different factors which, on their own, are benign.

If someone is crossing the road while checking thir phone (so not paying attention) and a drunk driver hits them with their car, what "caused" the accident?

Do phones "causes" accidents? Does drunk driving "cause" accidents?


The article is very narrow and focuses on a single example.

What's more, the real take-away is that you can put side by side two graphs. It doesn't mean there is any causal relationships between them. The example only seems convincing because both graphs are health-related. If they were Pac-man highscore vs milk production, it would show how hollow the article is.


> The article is very narrow and focuses on a single example.

Alberto Cairo is quite the polarizing figure in the data visualization world.


A very-related classic reading on this is a short-and-sweet

"How to Lie With Statistics"

https://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/03...


the page appears blank to me


At first I thought you were making a joke, but it's blank for me, too, even with adblockers/https redirect disabled.


Sorry, I wasn't joking (:

The page now loads correctly


Alberto Cairo, who produced the charts in the paper is about to publish the book "How Charts Lie: Getting Smarter about Visual Information". His other books are excellent, I bet this one will also be.


In other words, some people don't know how to read graphs.


No, that's not at all what this is about. A better one-sentence summary would be "Accurate data can be very misleading if, for example, it's viewed at the wrong level of granularity."


Well, that seems a particularly tricky chart to interpret.

In this case, I haven't read the article the chart was taken from; I don't know what argument the chart is supposed to support. Stripped of that context, it's a pretty confusing chart. It seems the author (of the SciAm article - Cairo) is using the chart to make a point about lying with charts. I don't think that Cairo is publishing his research on obesity and birth-weight - that seems to be Kitahara et al. If that's what's going on, then it's hardly surprising that the chart is hard to read; Cairo chose it to make exactly that point.

And I think he's being unfair to Kitahara et al., implying that they've deliberately contrived that chart to mislead.


The article is more about how you can read the graph as it is intended (showing a positive association, for example) but reading it uncritically means that you will ignore the possible context that generated the statistics.


XKCD on "Curve-Fitting Methods and the Messages They Send": https://www.xkcd.com/2048/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: