I predict this post is correlated with excellent comments.
October 24, 2017 11:47 AM   Subscribe

A common (and often misleading) internet retort is "correlation does not equal causation" - but than how do we identify causes? And what does correlation equal? The answer can be interesting and complicated, but if you want to understand how modern social science identifies when correlative effects are causal, Alex Edmans' Layman's Guide to Separating Correlation and Causation is a great place to start. Also, posting about correlation is itself strongly correlated (likely causally) with the following two links being posted in the comments, so I preempt them here: the inevitable xkcd comic and the Spurious Correlation site.
posted by blahblahblah (25 comments total) 69 users marked this as a favorite
 
Also, as some of you know - I am a social scientist and I am happy to answer any questions people might have about this stuff. (I also know Alex, but didn't know about this piece until it popped up on Twitter).
posted by blahblahblah at 11:54 AM on October 24, 2017


From skimming it, that Edmund article is a not-bad nice introduction to how causality is arrived at in a lot of reduced-form empirical work in economics. I think it's being a little hard on the use of internal instruments (i.e., lags) in panel GMM but that's a quibble.

James Heckman (this guy) has a paper which outlines (quite formally) his comparison of how econometric theory thinks about causality relative to how statisticians typically do. Since economists are really, really interested in causality, "under what assumptions is causal inference valid" is where a lot of work goes. But sometimes you don't need all of that work.

Nice post!
posted by dismas at 12:01 PM on October 24, 2017


Wow, I thought I had some background in the social sciences, but the concept of an instrument was new to me and is super clever. Good job, social scientists!
posted by straight at 12:17 PM on October 24, 2017


When it comes to correlation and causation, the thing I am most tired of is people implying that because they are not equal, correlation is therefore meaningless.

Whenever a study is posted where the data shows a correlation between, say, poverty and nose spiders*, people are now overly eager to jump in and say “correlation doesn’t equal causation! Maybe those nose spiders are just a coincidence! It’s like the pirates vs global temperature example! Maybe these factors are unrelated! Maybe people dealing with endemic poverty just like nose spiders! Did you ever think about that!!!!”

There is this internet-level understanding of the issue that leads to a lack of understanding of what correlation is used for, and why correlation matters.


*fictional example used to avoid derail, and also probably from my brain drifting to spiders georg in related meme-ification of stats topic
posted by a fiendish thingy at 12:17 PM on October 24, 2017 [3 favorites]


Maybe these factors are unrelated! Maybe people dealing with endemic poverty just like nose spiders! Did you ever think about that!!!!”

There is this internet-level understanding of the issue that leads to a lack of understanding of what correlation is used for, and why correlation matters.


... and a strong argument that anyone who chooses to use the correlation-does-not-equal-causation position must first pass Confirmation Bias 101
posted by philip-random at 12:29 PM on October 24, 2017 [1 favorite]


Wow, I thought I had some background in the social sciences, but the concept of an instrument was new to me and is super clever. Good job, social scientists!

Interestingly, its nearly 100 years old. And completely oversold, which is why we are starting to see its decline in some fields.
posted by MisantropicPainforest at 12:45 PM on October 24, 2017 [1 favorite]


I will probably dump a bunch of causation-related stuff in this thread later. But for now, I'll link a really nice paper that I read recently (though it's a few years old now) having to do with the history of causal inference in the social and medical sciences: Dowd, B. (2011). "Separated at Birth: Statisticians, Social Scientists, and Causality in Health Services Research," Health Serv Res 46(2), 397-420.
posted by Jonathan Livengood at 12:51 PM on October 24, 2017 [1 favorite]


I'm trying to wrap my head around the concept of instrument, and can't help thinking: doesn't it just sharpen the correlation?
It's late here and I'm tired, but could someone give me some more examples of instruments?
posted by Laotic at 1:02 PM on October 24, 2017


James Heckman (this guy) has a paper which outlines (quite formally) his comparison of how econometric theory thinks about causality relative to how statisticians typically do

My impression of that paper is that Heckman is trying to say how econometric theory should think about causality, not about how they do. Many econometricians are firmly in the Rubin causal model camp.
posted by MisantropicPainforest at 1:02 PM on October 24, 2017 [1 favorite]


Causality turns out to be something that, if you think about it long enough, will make you crazy. "Smoking causes lung cancer" but you can smoke and not develop lung cancer, and some people with lung cancer never smoked, so what exactly do we mean here? Questions about causality go back at least to Hume and Descartes but the modern philosopher Nancy Cartwright's work on this is essential - her 1979 paper "Causal Laws and Effective Strategies" is dense but mostly readable. Judea Pearl developed a graphical language to express concepts of causation, blocking, intermediation, and the like, see this 1995 paper. Econometric ideas like instrumental variables can and have been cast into his directed acyclic graph notation.

You might be surprised to know that epidemiology more or less gave up on a rigorous philosophical grounding for causality and has gone with a praxis based definition, the Bradford Hill Criteria -- so you test for things like a dose-response curve, and look carefully for intermediary variables, and try to avoid grand pronouncements about causation ("a is associated with a higher level of b within six months, and higher levels of a are associated with higher levels of b").
posted by PandaMomentum at 1:05 PM on October 24, 2017 [10 favorites]


This is a great post and I'm looking forward to digging in to it.

Over-reliance on "correlation is not causation" ignores the fact that correlation can be really fucking important, and there are whole sciences built around understanding correlation and gaining knowledge from it.
posted by entropone at 1:12 PM on October 24, 2017 [2 favorites]


PandaMomentum, I think you are hinting at the concept of probability and how humans process it. I think we are generally ill equipped to process probability, because we do not experience all the possibilities, only the one that actually occurred.
(Here, causality would just be a high probability that something leads to something else)
posted by Laotic at 1:16 PM on October 24, 2017 [2 favorites]


Yah, probabilistic causation is where we've landed since Popper, but Cartwright makes strong claims that this is insufficient. Useful but incomplete.

Instrumental variables are really hard to find!!! The best I can do is point you to this paper and how excess distance travelled to the NICU is able to help sort out the confounders. It uses a DAG to explain the blocking and colliding.
posted by PandaMomentum at 1:20 PM on October 24, 2017 [1 favorite]


It's late here and I'm tired, but could someone give me some more examples of instruments?

Sure! Lets say I want to see if extra education helps boost income. If I just see who has completed extra education and measure whether they have higher income, I am likely to have some issues. People who decide not to continue their education have all sorts of reasons (maybe they have lower ability, or have lower family savings, or are from certain parts of the country), so I may attribute their change in income to their lack of additional education rather than the underlying issues that caused them to not continue their education.

What I really need is a way to conduct an experiment, and randomly force people to take extra years of education and measure their income later. But this is unethical and hard to do. So there are two ways to approximate that effect:

This paper used the fact that people could stay in education to avoid the draft in Vietnam to see what the effect of education attainment was. This is a natural experiment, where people were forced into education due to low draft numbers (which were randomly assigned), creating the randomized experiment we wanted without us having to run it ourselves.

This paper instead uses an instrumental variable, birth quarter, to predict the return of schooling. It turns out that birth quarter predicts educational attainment (if you are born later in the year you complete more schooling), but birthday is unlikely to impact your underlying ability or family. We can use it to measure schooling without measuring other underlying factors.

There will be some quibbles about these examples (academics have lots of debates over the differences between natural experiments, quasi-experiments, and instrumental variables, among others), but the basic idea should hold.
posted by blahblahblah at 1:21 PM on October 24, 2017 [6 favorites]


For a more recent philosophical account of causation that is quite readable, check out Causation: A User's Guide by Paul and Hall.
posted by MisantropicPainforest at 1:22 PM on October 24, 2017 [2 favorites]


It's late here and I'm tired, but could someone give me some more examples of instruments?

Variation in rainfall for civl war onset! Variation in rainfall for dissent! Variation in rainfall for protests! Variation in rainfall for anything where people go outside!

I know what you are thinking: these can't all be right!
posted by MisantropicPainforest at 1:24 PM on October 24, 2017 [3 favorites]


Painforest, Panda, thanks for helping, I feel out of my depth here and can only go by common sense. It'll take me a fair bit of reading up to understand how instruments can help in predicting the probability of success of a solution (or strategy, to use the terminology of Cartwright linked by PandaMomentum above). I will not embarrass myself anymore and leave this can of worms for tomorrow : )
posted by Laotic at 1:32 PM on October 24, 2017


My impression of that paper is that Heckman is trying to say how econometric theory should think about causality, not about how they do. Many econometricians are firmly in the Rubin causal model camp.

My guess is that most applied people don't think about it in enough detail at all :)
posted by dismas at 1:43 PM on October 24, 2017 [2 favorites]


(I say gently as an applied person who does structural stuff)
posted by dismas at 2:33 PM on October 24, 2017 [1 favorite]


I have to admit I can get cranky with some of the causality through identification crowd. I've had to review a few papers so far that boil down to

(1) My idea is that X causes Y because of Reasons. In the example upstream, education causes income because it increases skill levels.
(2) Identification-based study to show that education causes income
(3) Therefore, education increases skill levels

And had to rather crankily reply that, no, you really haven't provided any evidence at all for the causal mechanism you offered, and there are many other pathways by which X might cause Y.
posted by GCU Sweet and Full of Grace at 5:46 AM on October 25, 2017


> My idea is that ... education causes income because it increases skill levels.

But what if education increases income because you're more likely to be hired by your classmates' rich dads? Have you thought about that, huh, huh ...?

> crankily reply that, no, ... there are many other pathways by which X might cause Y.

Ah. So, um, what he said.

(I've heard "Correlation does not equal causation, but it is really suggestive.")
posted by RedOrGreen at 7:58 AM on October 25, 2017


Because you are a social scientist, are you happy to answer any questions people might have about this stuff?
Because you are happy to answer any questions people might have about this stuff, are you a social scientist?

Which is it?
posted by Nanukthedog at 10:58 AM on October 25, 2017 [3 favorites]


I've bounced around the ins and outs of the "Correlation is not Causation" trope.

I initially found the value in disabusing folks of our natural pattern making instinct, linking outcomes superstitiously. Then watching the internet pedantry that phrase fostered it became 'problematic'.

I've really enjoyed the Buddhist concept of 'Conditions that Foster Outcomes' rather than Causation. The complexities of contributing factors make for focusing on Conditions more useful than trying to establish Causes.
posted by CheapB at 9:23 AM on October 26, 2017 [1 favorite]


Correlation does not prove causation, but lack of correlation does prove lack of causation.

The null result is more epistemologically robust.

See Popper.
posted by Pouteria at 12:58 PM on October 26, 2017


Correlation does not prove causation, but lack of correlation does prove lack of causation.

It absolutely does not.
posted by MisantropicPainforest at 4:11 PM on October 26, 2017 [1 favorite]


« Older This post is so sew.   |   The Suffix That Tells the Story of Modern Science Newer »


This thread has been archived and is closed to new comments