Simpson's Paradox
August 18, 2021 3:04 PM   Subscribe

If you look at Covid data from Israel across all ages, vaccine efficacy against severe disease is 67.5%. But if you break it down by age it turns out to be significantly higher: for those under 50 it's 91.8%, and those over 50 it's 85.2%. What's going on? "Simpson’s paradox arises when there are 'lurking variables' that split data into multiple separate distributions."

In this case: "... since older people are more likely to get sick with Covid but also more likely to be vaccinated, looking at all ages instead of breaking out specific age groups can greatly understate how effective the vaccines really are."
posted by russilwvong (28 comments total) 50 users marked this as a favorite
 
Simpson's Paradox is one of those things I used to cover in Statistics, but no matter how many examples I did, students never got the significance of it. Paradoxes are hard to teach.
posted by wittgenstein at 3:51 PM on August 18, 2021 [9 favorites]


This is cool! Somehow hadn't heard of this before. Thanks!
posted by It's Raining Florence Henderson at 3:54 PM on August 18, 2021


In undergrad I took a Statistics for Engineers which is basically 'here's two years of stats concepts compressed into six months', from which I recall a few distribution names and little else. A decade ago I bought a five dollar used textbook that was highly recommended on the Green, but it wasn't until last year that I had the motivation and frankly, lack of alternative forms of entertainment to actually run though it.

I now feel like I use the phrase Simpson's Paradox monthly, if not weekly, when looking at other people's analysis of data:

Them: pwnguin! This data shows a problem in production!
Me: Well, how about when you break it down by cluster
Them: Weird, it went away, nevermind!
Me: No, wait... you musn't... don't... ... you ... left my cube.
posted by pwnguin at 4:14 PM on August 18, 2021 [21 favorites]


pwnguin - do you have the title and author of that textbook handy?
posted by kristi at 4:46 PM on August 18, 2021 [5 favorites]


"Lurking Variable" is a username just waiting to be taken
posted by chavenet at 4:48 PM on August 18, 2021 [17 favorites]


I was so proud of "sir jective" but "Lurking Variable" is if possible an even better nerd joke

(Also: I love Simpson's paradox! Always teach it in my intro stats lectures)
posted by sir jective at 4:53 PM on August 18, 2021 [3 favorites]


This is a critically important consideration in looking at course instruction and student outcomes in education.

We generally do course-level assessments on how successful a faculty member has been in enabling their students to learn the learning outcomes, persist in the class, and succeed with a C or better in the course. Some are tempted to take these statistics as face value representations of the effectiveness or skill of the instructor in comparison to their colleagues teaching the same courses.

But there are many “lurking variables”, beyond just how the instructor is teaching, that are outside of the control of the instructor. Some of them are known to affect student outcomes: days on which the course is taught, time of day the classes meet, full-time vs part-time role (and the office space and other support that’s related to that), etc.

Other factors are known but hard to gauge, such as possible attitudes of sexism, racism or ageism directed at an instructor. (I cannot tell you how many times I have heard my women colleagues on the receiving end of some sexist attitude from a student.)

This is one of the reasons that faculty are often leery about these statistics being used to incentivize pay raises, bonuses or continued employment. Aside from the bad pedagogical incentives it introduces, most folks outside of the teaching profession, who haven’t grappled with these variables, don’t have these insights into how a relatively good instructor can nevertheless have below-average student outcomes.
posted by darkstar at 5:01 PM on August 18, 2021 [16 favorites]


Never trust a percentage. GLM model parameters are safer but only if you know they included all reasonable nuisance regressors.
posted by biogeo at 5:09 PM on August 18, 2021


I found this article linked from the tweet to be a richer explanation of what's going on.

As the pandemic progressed, I remember thinking that the heedful would have a chance to learn about all kinds of interesting concepts from biology, the health sciences, statistics, and so on. Well terrific, now Simpson's Paradox has entered the vax-or-no-vax chat. Not exactly the stuff you're going to get in week 1 of the stats course. I can't wait for the next pandemic to require science communicators to try desperately to ease us into a casual familiarity with tensor calculus.
posted by Chef Flamboyardee at 5:12 PM on August 18, 2021 [13 favorites]


Seconding the request for the stats book name, pwnguin!
posted by clew at 5:18 PM on August 18, 2021 [2 favorites]


I can't wait for the next pandemic to require science communicators to try desperately to ease us into a casual familiarity with tensor calculus.

Okay, well, you see, there are some states where people refuse to take basic public health precautions, like wearing masks and getting vaccinated, and we call these states "vector spaces"...
posted by biogeo at 6:34 PM on August 18, 2021 [27 favorites]


And if you find yourself needing to travel from one such state to another, you may need to fly, which naturally is a pretty stressful experience. Therefore we call the means of traveling between vector spaces a "tensor"...
posted by biogeo at 6:38 PM on August 18, 2021 [15 favorites]


Metafilter: included all reasonable nuisance regressors.
posted by Umami Dearest at 8:35 PM on August 18, 2021 [2 favorites]


By popular demand, I present the AskMeFi question from 2008 where I was introduced to Statistics by Freedman, Pisani and Purves. Please shower it with favorites instead of me.
posted by pwnguin at 9:10 PM on August 18, 2021 [10 favorites]


What I don't understand is this:
"Our intuition tells us that the flavour that is preferred both when a person is male or female should also be preferred when their sex is unknown, and it is pretty strange to find out that this is not true — this is the heart of the paradox"
Is this only a paradox if you think gender is relevant to which juice you like? Wouldn't this just tell you it doesn't make sense maybe because it's a meaningless distinction?

And another thing I don't understand is,
"What he found was surprising: there was a statistically significant gender bias in favour of women for 4 out of the 6 departments, and no significant gender bias in the remaining 2. Bickel’s team discovered that women tended to apply to departments that admitted a smaller percentage of applicants overall, and that this hidden variable affected the marginal values for the percentage of accepted applicants in such a way as to reverse the trend that existed in the data as a whole."
My feeling is that it's still possible there's gender bias happening if fewer women are getting admitted enough to be noticeable. Why do the departments that admit a lot of students tend to admit a lot of men? Why do departments that receive lots of women applicants tend to admit fewer students? I just don't understand how you can say "the trend ceases to exist when you look at this way!" Yes and you can make your fingers look like legs when you hold them a certain way but that doesn't make them legs? You're just doing something random & weird to make it do that, no?
posted by bleep at 9:11 PM on August 18, 2021


Yes and you can make your fingers look like legs when you hold them a certain way but that doesn't make them legs? You're just doing something random & weird to make it do that, no?

The reason you break things down by subcomponents is to better understand the shape of the problem. If you choose random and weird dimensions, it wouldn't be very convincing or enlightening. Like, if you tried examining applicant's surname initial for bias, there's no reason to expect it would be real, and still 26 chances for a false positive. But breaking things down by department makes sense, as departments are generally making graduate admissions decisions with little oversight or cross checks. So if there were a boys club being propagated at Berkeley, that seems like it would show up in the data. But when you look at the breakdown, there isn't any one department driving the average down.

There can still be systemic reasons why departments are underfunded, but you're no longer just inspecting the admissions committee as a decision maker, but the whole system: funding sources that influence how faculty headcounts are granted, and factors could that influence men and women to apply to different areas of research. The more complicated analysis reveals a more complicated dynamic! And justifies more research! For example, I recall some research from decades ago in the news about how posters of video games in the conference room affected teenage girl's surveyed attitudes about computer engineering.* And we can theorize others, I'm sure.

Fundamentally, the point is that two analyses can come to different conclusions, and we need active, curious minds to poke at things. The black and white version of this chart tells a wildly different story than the color version; the COVID story provides a less cut and dried real life example: if you ignore the coloring (age), it seems like vaccines isn't very effective any more. If you do add that dimension in, efficacy rate goes way up. You can argue that the "under / over 50" binning is arbitrary and the author would agree -- towards the bottom of the full article (instead of a tweet) they list out a finer grained distribution and show that it's doing about as well as expected, contrary to some inflammatory claims from publications that should know better.

*it's probably BS that doesn't replicate but that just means we need better science and larger trials.
posted by pwnguin at 10:18 PM on August 18, 2021 [3 favorites]


I can definitely come up with a reason for applicant surname bias.
Many application systems spit out results alphabetically. Admissions committees may split up who reads what alphabetically, or maybe readers are simply given an alphabetic list and they read in order. I did this the first year I read applications, except ours were alphabetized by first name. And then I never did that again, because I got so grouchy by the end of reading and that was pretty unfair to the Zarathustras of the world.

This is one of the things about stats I don’t understand— yes, of course you should include all reasonable confounding variables. But what’s reasonable? Sometimes it’s obvious (age, vaccination rate) and sometimes it isn’t (applicant name letter). It also isn’t really ok to just include everything and keep trying stuff to see if the right data slicing magically gives the result you want— that feels too p-hacky.
posted by nat at 1:39 AM on August 19, 2021 [1 favorite]


IF two [or twenty] text-rich explanations of Simpson's Paradox fail to unfurrow your brow THEN this , different medium, 4min video may help?
tl;dw: more money turns you into a cat.
posted by BobTheScientist at 3:43 AM on August 19, 2021 [5 favorites]


See, normally if you go one-on-one with another statistic, you got a 50/50 chance of winning. But I'm a lurking variable and I'm not normal! So you got a 25%, AT BEST, to beat me. Then you add Kurt Angle to the mix, your chances of winning drastic go down...
posted by delfin at 4:13 AM on August 19, 2021 [1 favorite]


Statistically speaking, if you abstract over all of time and space and all the known properties of matter and energy, the odds of any one of us being alive at this particular point in time are essentially zero. Yet here we are! The relevance of the being-alive feature of our existence becomes apparent only when the domain is appropriately limited and the data suitably filtered and grouped.
posted by dmh at 4:19 AM on August 19, 2021 [2 favorites]


Statistically speaking, if you abstract over all of time and space and all the known properties of matter and energy, the odds of any one of us being alive at this particular point in time are essentially zero.

I was half expecting this to wander into a Douglas Adams quote.
posted by fight or flight at 4:26 AM on August 19, 2021 [3 favorites]


"Our intuition tells us that the flavour that is preferred both when a person is male or female should also be preferred when their sex is unknown, and it is pretty strange to find out that this is not true — this is the heart of the paradox"
Is this only a paradox if you think gender is relevant to which juice you like?
Perhaps this is a bad example because it gets people thinking about gender and juice and whether a correlation between them makes sense. In fact, if gender is irrelevant to juice preference, then the juice that is preferred by both genders is definitely preferred (given enough data, etc.) when gender is unknown (because breaking the data down by gender doesn't do anything). Simpson's paradox only happens when breaking the data down has an effect.
posted by dfan at 5:07 AM on August 19, 2021 [1 favorite]


Seconding the video that Bobthescientist posted as a really clear explanation of the paradox (plus cats). There are always reasons why there may be hidden correlations that are actually being proven - but what the paradox points out (correct me if I’m understanding it wrong, I learned this today during this thread) is that you can plot out the overall trend, and then figure out why the disparate groups are disparate in other studies.
posted by Mchelly at 7:10 AM on August 19, 2021 [2 favorites]


Seconding the video that Bobthescientist posted as a really clear explanation of the paradox (plus cats)

Definitely! If I had known more money can make you a cat I would have tried much harder to be wealthy!

(The video is better because it doesn't use "preference" as part of its demonstration of the paradox, which avoids any associated imagining over how preferences might work in an abstract consideration, an often troublesome issue in use of statistics.)
posted by gusottertrout at 7:26 AM on August 19, 2021 [1 favorite]


Simpsons paradox arises when there are hidden variables that split data into multiple separate distribution.

I wish they had gone into more difficult and abstract examples, because every one they use (in the video too) they seem to be lazily consolidating disparate groups.

They give me the impression (I'm sure it's wrong) that Simpsons Paradox is actually when multiple separate distributions are lazily consolidated for reasons. Shouldn't verifying that a sample of extraneous variables don't skew results be part of the basic original analysis before publishing your data? Assuming data points are homogenous or irrelevant should be proven, not assumed.
posted by The_Vegetables at 8:47 AM on August 19, 2021


They give me the impression (I'm sure it's wrong) that Simpsons Paradox is actually when multiple separate distributions are lazily consolidated for reasons.

The reason is that people don't want to see 23 separate statistical results, they want to know "the answer" and quit often combining the results is perfectly fine and it's not at all obvious why it wouldn't be fine.

Think about the part of the video where educational outcomes between Wisconsin and Texas were compared. Wisconsin did better overall, but Texas did better for each ethnic group, so maybe Texas is better.

Or... maybe not. What about breaking down by income within each ethnic group? Would that give us different results? Maybe Wisconsin does better for poor Hispanic students and rich Hispanic students, but worse for Hispanic students overall?

At what point do you stop breaking things down?
posted by It's Never Lurgi at 9:05 AM on August 19, 2021 [1 favorite]


They give me the impression (I'm sure it's wrong) that Simpsons Paradox is actually when multiple separate distributions are lazily consolidated for reasons.

This is my understanding, though I could be wrong: Simpson's Paradox is that apparent trends observed in categorical groups within a distribution can disappear or change when those groups are combined, so we only know that those groups are "separate distributions" in terms of whether they evince different trends after we've investigated the original distribution.

There's an element of parsimony in this, namely that we should never assume ahead of time which categorical groups within a distribution may evince trends incompatible with the observed overall trend or why.
posted by clockzero at 9:53 AM on August 19, 2021


Or... maybe not. What about breaking down by income within each ethnic group? Would that give us different results? Maybe Wisconsin does better for poor Hispanic students and rich Hispanic students, but worse for Hispanic students overall?

At what point do you stop breaking things down?


You stop at the point that the discrete variables stop being meaningful. Of course, choosing discrete variables can be difficult, but high-level ones like sex, education, and income status are pretty universal and are usually meaningful.

That's why I wish they would have had an example where the variables to choose are not very obvious. I'm sure in gene therapy or other more complex medical disciplines, this gets very complex.
posted by The_Vegetables at 1:34 PM on August 19, 2021 [1 favorite]


« Older These People Who Work From Home Have a Secret:...   |   #BamaRush, Explained Newer »


This thread has been archived and is closed to new comments