Replication study fails under scrutiny
March 6, 2016 3:56 AM Subscribe

A much publicized study (previously) suggested that more than half of all psychology studies cannot be replicated. A new study finds that the replication study was full of serious mistakes and its conclusion is wrong.

via Nick Gruen.

As [a presumably outstanding] example, Gilbert described an original study that involved showing White students at Stanford University a video of four other Stanford students discussing admissions policies at their university. Three of the discussants were White and one was Black. During the discussion, one of the White students made offensive comments about affirmative action, and the researchers found that the observers looked significantly longer at the Black student when they believed he could hear the others' comments than when he could not.

"So how did they do the replication? With students at the University of Amsterdam!" Gilbert said. "They had Dutch students watch a video of Stanford students, speaking in English, about affirmative action policies at a university more than 5000 miles away."

In other words, unlike the participants in the original study, participants in the replication study watched students at a foreign university speaking in a foreign language about an issue of no relevance to them.

But according to Gilbert, that was not the most troubling part.

"If you dive deep into the data, you discover something else," Gilbert said. "The replicators realized that doing this study in the Netherlands might have been a problem, so they wisely decided to run another version of it in the US. And when they did, they basically replicated the original result. And yet, when the OSC estimated the reproducibility of psychological science, they excluded the successful replication and included only the one from the University of Amsterdam that failed. [...]

posted by hawthorne (49 comments total) 20 users marked this as a favorite

They had Dutch students watch a video of Stanford students, speaking in English, about affirmative action policies at a university more than 5000 miles away."

In other words, unlike the participants in the original study, participants in the replication study watched students at a foreign university speaking in a foreign language about an issue of no relevance to them.

Dutch people are pretty much all bilingual in English, more so if they're college educated. Even regular TV series are subtitled unless they're for children.
posted by sukeban at 3:59 AM on March 6, 2016 [5 favorites]

But can the results be replicated?

Oh, fuck. Sammyo beat me.
posted by oheso at 4:03 AM on March 6, 2016 [3 favorites]

sukeban: That still leaves the quite significant issue of the lack of significance of the political issue in the Dutch context.
posted by biffa at 4:12 AM on March 6, 2016 [13 favorites]

They've heard of it.
posted by sukeban at 4:15 AM on March 6, 2016

They've heard of it.

Are you of the view that this was a high-fidelity replication of the original study?
posted by howfar at 4:26 AM on March 6, 2016 [2 favorites]

I am of the view that that particular objection is rather weaksauce.
posted by sukeban at 4:27 AM on March 6, 2016 [2 favorites]

Can you expand on why? Do you believe that functioning in a second language and having a fundamentally different connection to an issue and the actors involved are not factors which reduce the fidelity of the replication? And can you give reasons why?
posted by howfar at 4:30 AM on March 6, 2016 [15 favorites]

So if you go back to physics and do a very simple experiment, say verifying that the boiling point of water and it holds at 212F all over the world then discover that at the University of Denver it boils at 200, is the experiment in Denver a failure?

Perhaps the scientists have not identified some essential factor?
posted by sammyo at 4:50 AM on March 6, 2016 [5 favorites]

I look forward to all psychology papers now ending with: "...or is it?"
posted by mittens at 5:08 AM on March 6, 2016 [32 favorites]

The example of the boiling point of water in Denver is a bit off. A more accurate example would be if someone had determined the boiling point of water in NYC (i.e. at about sea level), then someone else determine the boiling point of water in Denver and declared the NYC result impossible to replicate. This would not prove that either boiling point measurement is wrong, but the Denver study’s conclusion would be wrong - the NYC study’s measurement could possibly be replicated if they had correctly matched the elevation.

In this case, the replication study changed critical variables (e.g., cultural context) and therefore weren't truly replicating the original studies.
posted by Tehhund at 5:10 AM on March 6, 2016 [22 favorites]

howfar: No, I won't, because they're Dutch college students, not San people from the Kalahari desert, and they are presumably familiar with American issues even if they've been filtered through American pop culture (which, as I mentioned, they get undubbed). Also, this is not the hill I want to die on. Sheesh.
posted by sukeban at 5:31 AM on March 6, 2016

So, the when they studied the study that found that studies couldn't be replicated, it couldn't be replicated? All right then.
posted by Makwa at 5:36 AM on March 6, 2016 [3 favorites]

Dutch students will have a different relation to American race issues. They'll also have a different relation to people speaking in English.
This is not necessarily a giant difference from somebody whose first language is English. But I would not call putting fish in an 18C tank a replication of an experiment that had fish in a 21C tank, even if the two are quite similar.
Sorry sukeban if this is piling on you a little but that's a strange opinion if one is trying to reproduce things scientifically, even if socially the longer-term effects might be quite similar.
posted by solarion at 5:48 AM on March 6, 2016 [9 favorites]

So, the study showing that the study that shows studies have results that can't be duplicated have results that can't be duplicated? Who's on first?
posted by Slap*Happy at 6:01 AM on March 6, 2016

Dutch students will have a different relation

Until someone links to the original study in question, we can only assume that it didn't mistakenly try to draw any conclusions from its results that would apply to Dutch students, or anyone else besides students at Stanford in a particular year.
posted by sfenders at 6:04 AM on March 6, 2016

No, I won't, because they're Dutch college students, not San people from the Kalahari desert, and they are presumably familiar with American issues even if they've been filtered through American pop culture (which, as I mentioned, they get undubbed).

They're from a country that celebrates Christmas with blackface, so to claim that there's no significant cultural differences that may have an effect on how they approach the issue is a pretty big hand-wave.
posted by zombieflanders at 6:25 AM on March 6, 2016 [17 favorites]

Yeesh, maybe we can knock off the "replication study fails to replicate har har" bits now? It's an obvious shot that's been taken several times now.

I also think focusing on the Dutch example given in the write-up of the paper is a little too narrow. As a glance at the actual paper shows, the issue is when one study should count as a replication of another one. Obviously we can't just say "whenever they get the same result". The OSC researchers sometimes carried out what are called "conceptual replications"--attempts to find the same basic sort of underlying process at work in widely varying conditions. These are opposed to "direct replications", which try to mirror the original study in as many details as possible. Dan Simons wrote a nice primer on these issues that's worth a read.

Part of the problem, of course, is that researchers tend to appeal freely to the existence of conceptual replications when it favors their results, but not when it goes against them. See the explosive debate over Bargh's age stereotype priming study for an example! Maybe it would be better, as Simons suggests, if experimentalists committed themselves in advance to the range of conditions under which their results should hold.
posted by informavore at 6:28 AM on March 6, 2016 [9 favorites]

I think that sukeban's perception of this is very interesting (and I acknowledge that this is not the hill upon which sukeban wishes to die, and apologize for appearing to continue a derail and/or pile on sukeban's good faith and perfectly innocent comments). It underscores how difficult it is to figure out when cultural contexts carry over into different cultures.

So the study in question is about people's reaction to prejudice. Everybody in the world knows that the US is a country where race relations are a big political issue, so surely that cultural context should seamlessly translate to an outsider, right?

Actually, I think that most people (American and non) massively underestimate how subtle and culturally specific US attitudes toward race are. I am not American, and when moved there for a couple of years I had the outsider's confidence that I understood American culture via TV and movies, etc. Oh boy was I wrong! I can't count the times when I said something perfectly innocent that the Americans around me read as horrendously racist. Much 'race' discourse in the US is coded, and the codes are completely non-obvious to an outsider. Nevertheless, Americans are so used to their culturally-specific codes that they often assume them to be obvious, the kind of thing that no normal person could accidentally invoke. Indeed, even the racial boundaries themselves, what racial category people fall into (which Americans are largely brought up to perceive as natural and inviolable), are often opaque to the foreigner who can easily 'miss-race' somebody.

So, not having seen the video, I bet that there was a huge amount of subtle race-related signalling going on, and I bet there was a lot of subtle race-related signaling going on between the students watching the video too. Merely understanding the English language would be insufficient to pick up on or replicate that signalling.

Perhaps it might be clearer if we pick a different example: what if the video was about caste prejudice in India? I mean, American students understand Indian English, so they can tell what's going on even if they can't catch the nuances of every idiom. And they've heard of caste prejudice and probably know a little bit about it. So would they respond to video the same way Indian students would? I think that most of us would guess 'no'. Why? Because we don't see Indian culture as being 'universal' and universally understood. We do, however, overestimate the degree to which American culture is 'universal' and universally understood by everyone around the world.

One thing that the last fifty years of psychology and sociology have firmly underlined is that culture is very important, specific and subtle. We have to respect that observation when we try to figure out how to generalise from psychology studies.
posted by Dreadnought at 6:55 AM on March 6, 2016 [44 favorites]

A new study finds that the replication study was full of serious mistakes and its conclusion is wrong.

I don't think this is a very accurate description of the situation. Consensus from researchers I have reason to believe trustworthy is that the Gilbert et al critique is itself problematic.

Yes, it's not clear that the failure to replicate the Dutch study should really count as a replication attempt in the first place, and it's easy to cherry-pick this example and focus on it. However, this issue is a lot more complicated than it might seem. First of all, most of the original replications were approved by the original authors. I'm not sure if this one was, but it is possible. Second, deciding what design differences are allowed in a replication is itself a hard problem, and some are always inevitable; this is especially true when a study is trying to draw or imply much more general conclusions than its immediate result (as a lot of social psych of this stripe does). Third, Gilbert et al are providing an argument by example -- they pick a key case and draw your attention to that, implying that the whole thing is like that.
From what I understand, the Gilbert et al analysis is itself statistically problematic. I haven't tried to evaluate it myself, but here is a detailed discussion.
For an extremely balanced and charitable take on the Gilbert et al critique see here. What this is mostly trying to tease out is how to think about cases when a replication is inconclusive. Gilbert et al want to treat them as successful, the original study doesn't; the author is suggesting that they are neither successful nor failed replications. The point of this post isn't to trash either of them, in fact I think the ending note is rather important:
We are greatly indebted to the collaborative work of 100s of colleagues behind the reproducibility project, and to Brian Nosek for leading that gargantuan effort (as well as many other important efforts to improve the transparency and replicability of social science). This does not mean we should not try to improve on it or to learn from its shortcomings.
Gilbert et al do not start with a premise like this.
It should be noted that the first author of the critique, Daniel Gilbert, is not obviously a trustworthy party on this. That isn't to say that he's wrong, but he's someone that if he says "psychology replicates just fine" (and this is definitely the message he's putting out via various channels), you'd want to careful examine every detail of the argument. The reason I suggest this is that he's semi-notorious for going after the replication researchers right after that study was released in an extremely unprofessional way on twitter and other social media. (And, honestly, I suspect getting away with it mainly because of the H word.) It all happened a while ago so it's hard to provide very detailed links, but there's a bit of this in a paywalled CHE article, and a bit more here, an apr 1 (2015) parody of Gilbert and others' initial responses to the replication study, by Andrew Gelman.
If you want to know what Andrew Gelman thinks about the critique (and you probably should), here's some discussion in comments on his blog.

posted by advil at 7:00 AM on March 6, 2016 [19 favorites]

"Replication of replication fails" is poor framing, in my opinion, but I'm not unbiased. So I'll just recommend this critique of the critique.
posted by galaxy rise at 7:01 AM on March 6, 2016 [2 favorites]

I can't believe the thread has progressed this far without posting a direct link to the response from the original replication study's authors (published almost immediately after the critique). But cheers to advil for their measured comment that includes this in one of the links.

I wouldn't wholly trust Gilbert et alia's critique because they are well-established figures who stand to gain from saying there's nothing suspect about the publication process. It's not exactly insider information, but the association overseeing journal publishers has been actively battling against the concept of open access for years. The lead author of the replication study, Brian Nosek, just happens to be the founder of the Center for Open Science and one of the architects behind the Open Science Framework.

Despite all of this, Nosek et alia's response is fairly tempered, posibly because there were about 50 other authors who contributed.
posted by Johann Georg Faust at 8:30 AM on March 6, 2016 [8 favorites]

Yeah, the title of this post is just plain misleading.
posted by forkisbetter at 9:01 AM on March 6, 2016 [1 favorite]

Wait, we need to replicate the replication study due to replication errors? How many replication of replication of the original replications need to be replicated to be sure we're sufficiently replicated? Then re-replicate the replication of the replication?

Which, strangely enough, proves why we need replication studies. It does not get the original researchers off the hook. Research requires such a fanatical vigilance, but you don't always see it happening.

Thank you for the link.
posted by Alexandra Kitty at 9:08 AM on March 6, 2016 [1 favorite]

Thanks to advil, galaxy rise, and Johann Georg Faust for pointing out that the reply by Gilbert and colleagues is far from leakproof. The situation is far more nuanced than the linked Science Daily release suggest. There are at least two links to Sanjay Srivastava's blog post (and now three!) about this, and I'd like to make sure that folks who are interested in methodological and statistical issues in psychology (and social and behavioral sciences more generally) know about Srivastava's blog. He's a rigorous thinker and a clear writer, and is a good go-to source for distillations of the ongoing debate about replication in psychology.

The statistical arguments made by Gilbert and colleagues are shaky, and the other critiques they offer are not newsworthy, having shown up as soon as the original Science paper was available. The there's-nothing-to-see-here attitude about psychology research is wrong. The bad news is that it will take some time for the old guard (I'm in the generation between Gilbert and newly-minted PhDs) to lose its grip on psychology journal editing and publication processes (which is largely what created the methodological and statistical problems that dog psychology and other sciences). The good news it that new PhDs and those being trained are now being made aware of how the bad old way of doing and publishing research is problematic, and what better ways there are to do things (e.g., preregistration of research; open data; etc.). Having been in a psychology department for 15+ years, I can see the change happening among our students. Having taught the statistics courses to our graduate students for that same time period, I know I've changed the way I teach and talk about data analysis and publication. Change is coming, but it will happen generationally.
posted by anaphoric at 9:10 AM on March 6, 2016 [6 favorites]

I'm so happy this investigation is happening and in public. Even if the replication study analysis has some errors that weaken the conclusion, it's still a huge contribution to the field in terms of investigating how we know what we think we know. Particularly impressed it's all open science with a published dataset for re-analysis.

Taken together, the study and its critique and the responses just further indicate how hard psychology science and statistics is. I mean even in this thread we have people confused about what it means to reproduce the Stanford study on race, what the essential part of replication is. And the critique and the responses are full of argument about the right way to apply statistics to come to meaningful conclusions. And this for a study that's all about precise statistics and verification! As an outsider the conclusion I draw is to reinforce my skepticism about statistical conclusions.

Planet Money had a good podcast about the replication study that's worth listening if you want more. I particularly liked the emphasis on trying to create a culture where psychologists even attempt to replicate each other's work, or at least document it thoroughly enough that replication is possible. Sloppy science is bad science, we all benefit from making it stronger.
posted by Nelson at 10:56 AM on March 6, 2016 [1 favorite]

But can the results be replicated?

Oh, fuck. Sammyo beat me.

No, you just... replicated it.

same as it ever was

same as it ever was

same as it ever was

same as it ever was
posted by Halloween Jack at 11:16 AM on March 6, 2016

So is coffee good for us or bad for us?
posted by Splunge at 11:21 AM on March 6, 2016 [3 favorites]

Unknown. The research team is still in the Amsterdam coffeeshop trying to replicate earlier results.
posted by sebastienbailard at 12:08 PM on March 6, 2016 [4 favorites]

My PhD is in the hard sciences, not the social sciences, and I'm watching this whole thing play out from afar with some bemusement.

Do the replicability people have actual subject matter experience? The Stanford/race/Dutch student example was one of many serious experimental mismatches that were described by Gilbert et al in the recent Science article--so many that were so off base that it certainly seems like the effort to "replicate" those studies was disingenuous.

In my current job role, which deals heavily with data use and reuse in the biological sciences, I have seen countless examples of people without specific subject matter expertise completely misusing data, because they simply don't fully understand the limits, context, and appropriate use of the data. Knowing statistics doesn't make you an expert in every discipline under the sun.

I'm also thinking about the notion that the replicability effort attempted to get endorsement by the original researchers for their experimental design. This seems like an attempt to borrow validity from the establishment they're trying to undermine. In addition, getting the blessing of the original PI For what they planned to do doesn't mean they did a good job of doing the experiment. If the replicability project particpants didn't have actual prior subject matter expertise--and I mean psychology research expertise specifically--it may well not matter that the original PI signed off on their protocol if they didn't know how to actually do the study.

Drawing a parallel to my own field (biochemistry/protein science), if someone new to this kind of lab work came in with a protocol in hand and tried to replicate it, there are a thousand technical details that could completely derail their work. These are things that people who essentially go through the apprenticeship of being a tech or a grad student learn as they go along and is tacit, assumed knowledge that absolutely no materials and methods paper will ever, every describe.

Finally, that kind of bombastic publicity stunt really reeks. What exactly is achieved by trying to destroy the reputability of all of the social sciences?
posted by Sublimity at 1:09 PM on March 6, 2016 [2 favorites]

I'm also thinking about the notion that the replicability effort attempted to get endorsement by the original researchers for their experimental design. This seems like an attempt to borrow validity from the establishment they're trying to undermine. In addition, getting the blessing of the original PI For what they planned to do doesn't mean they did a good job of doing the experiment. If the replicability project particpants didn't have actual prior subject matter expertise--and I mean psychology research expertise specifically--it may well not matter that the original PI signed off on their protocol if they didn't know how to actually do the study.

Additionally, it does you no good to borrow validity from somebody whose validity you are trying to sabotage, for the same reason that standing on a bridge you are dynamiting is a terrible idea.
posted by Pope Guilty at 1:25 PM on March 6, 2016

It should really be no surprise that psychology experiments are hard to reproduce. Most of them are done on very small sample sizes. Why? Because previous published papers got away with smallish sample sizes, and so over time you get a precedent where the sample size you need to get taken seriously creeps lower and lower.
posted by memebake at 1:30 PM on March 6, 2016

This is a pretty stark illustration of how replacing p values with Effect Sizes and Confidence Intervals was a vast exercise in lipsticking a pig.

The overall difference in conclusions between the two replication efforts essentially boils down to:

1. The OSC (mis?)designed their metrics with regard to confidence intervals so that replication efforts that resulted in narrow confidence intervals were unlikely to be counted as replications.

2. The new response made up a fantasy definition of confidence intervals with zero regard for probability theory. Then, claimed, based on their own definition of what a confidence interval is, that a bunch of successful replications have occurred.

The data the two groups are basing their claims on don't conflict as much as their statistical claims. The first study was a landmark because it was an imperfect effort that opened up an extraordinarily necessary conversation. The regressive claims in the newer study do nothing to forward that conversation.
posted by ethansr at 1:42 PM on March 6, 2016 [1 favorite]

If the replicability project particpants didn't have actual prior subject matter expertise--and I mean psychology research expertise specifically--it may well not matter that the original PI signed off on their protocol if they didn't know how to actually do the study.

The vast majority of the replicators are psychology research professionals - principal investigators with their own labs, postdocs, graduate students, etc. There are 270 participants, so I can't speak to everyone's background, and of course training doesn't ensure competence. But generally speaking the reproducibility project is an effort to improve research psychology by research psychologists.

I'm also thinking about the notion that the replicability effort attempted to get endorsement by the original researchers for their experimental design. This seems like an attempt to borrow validity from the establishment they're trying to undermine.

Additionally, it does you no good to borrow validity from somebody whose validity you are trying to sabotage, for the same reason that standing on a bridge you are dynamiting is a terrible idea.

I'll repeat, this is an effort to improve research psychology by research psychologists. They're not trying to sabotage or undermine or destroy anything. They want to improve the way psychology research is done so we can make better and quicker progress. The replicators have by and large warmly received critiques of their efforts.
posted by galaxy rise at 1:44 PM on March 6, 2016 [5 favorites]

Shouldn't this be filed under sociology?
posted by destro at 2:11 PM on March 6, 2016

They're not trying to sabotage or undermine or destroy anything. They want to improve the way psychology research is done so we can make better and quicker progress. The replicators have by and large warmly received critiques of their efforts.

Maybe they're not mindful of how their work has been received more broadly, but not a few people I know who are professionals in math/CS/hard sciences, often with heavyweight academic credentials, take their big splashy pronouncement to be a repudiation of essentially all social science research.
posted by Sublimity at 2:55 PM on March 6, 2016

Well that is the pregnant question here. If many psychology experiments can't be replicated, that's a significant problem for the discipline.
posted by Nelson at 3:06 PM on March 6, 2016 [1 favorite]

Maybe they're not mindful of how their work has been received more broadly, but not a few people I know who are professionals in math/CS/hard sciences, often with heavyweight academic credentials, take their big splashy pronouncement to be a repudiation of essentially all social science research.

I can't speak for the 270 authors on the project, or for the wider reproducibility/open-science movement, but to me your characterization of their efforts as splashy and bombastic feels uncharitable. This is an exceptionally tricky topic.

The scientific research community is largely non-hierarchical - there are certainly organizations and individuals with more influence than others but it's not like Nosek or Gilbert or anyone else can wave their hands and easily change how research is conducted. It's an ongoing effort to identify and improve questionable research practices. Publicity helps that effort, quite a lot, and so by and large the reproducibility/open-science movement has sought to publish in the highest ranked journals and raise awareness through the popular science press. But of course the downside of this is that you get journalists looking for the most eye-catching headlines writing variations on "Psychology research is all useless!" when that's not something any of us believe.
posted by galaxy rise at 4:05 PM on March 6, 2016 [3 favorites]

I actually thought the original 37% figure, taken at face value, pretty closely matched my prior for reading a single (but reasonable) positive result with a p-value of between 0.005 and 0.05. It was, if anything, a little higher than I expected it to be. Maybe I'm just a nihilist, but that never seemed like a "crisis" to me as much as a pretty expected amount of non-replication, given all the researcher degrees-of-freedom and latent variables and file-drawer effects that we know have to be going on.
posted by en forme de poire at 4:55 PM on March 6, 2016

If the replicability project particpants didn't have actual prior subject matter expertise--and I mean psychology research expertise specifically--it may well not matter that the original PI signed off on their protocol if they didn't know how to actually do the study.

I think this is a critical difference between the "soft" and "hard" sciences - in the hard sciences, the effects are real and true, but technically tricky to ascertain, so you need to be an "expert" to do it right. (Hook one ethernet cable up backwards and you get time traveling tachyons /sarc). This has a name: Internal Validity.

In the soft sciences, the effects are weak and squishy but they are presumed to be squishy because humans are squishy which adds measurement error, but the underlying truths are believed to be fundamental and robust. Under this scenario, it shouldn't matter if you have 5 of 7 questions on your Likert scale, study racism in the USA or in the Ukraine, and study babies, teenagers, or retirees. It should still work.

This has a name: External Validity (aka Generalizability).

Two very different concepts, and it's not surprising to people from various fields not understanding the other's viewpoint.

From my viewpoint, the Critique is basically "You didn't EXACTLY replicate the experiments using EXPERTS so of course it didn't work" which sounds like a hard-science critique about Internal validity.

From the soft-science perspective this is a strength of the design not a weakness, as it tests External validity: if a small change in experimental procedures or subjects ruins the effect, is it really a robust effect to begin with?
posted by soylent00FF00 at 5:38 PM on March 6, 2016 [3 favorites]

the most eye-catching headlines writing variations on "Psychology research is all useless!" when that's not something any of us believe.

For certain values of "us", perhaps. I worry that matters of policy and law that could and should be informed by social science research are going to be a whole lot harder to enact, if the sound-bite takeaway of all this is "scientists prove that social science is bunk". I have a pretty high math tolerance and I am not going to slog through all the verbiage about how many angels can dance on the head of a confidence interval, which is the gist of this last volley--it's sure as heck not going to inspire the confidence of the general public.

if a small change in experimental procedures or subjects ruins the effect, is it really a robust effect to begin with?

Well, that depends on the change, doesn't it? In another example (I'm remembering this off the top of my head, so the deets may be somewhat askew) the initial study was asking college students who were commuters about the effect of the commute on their studies, and the "replication" worked with people who weren't students at all. Is that a small change or a big one? How closely does it relate to the fundamental question?
posted by Sublimity at 8:00 PM on March 6, 2016

You can't fool me. It's replications all the way down.
posted by storybored at 8:57 PM on March 6, 2016

In the hard sciences you get scientists reporting rates not too dissimilar (or worse). For example at Bayer they got frustrated enough with literature reports not reproducing that they spent money doing this systematically internally and found well under half could be verified. This was not surprising to anyone in industry. Indeed, running through the literature and finding bad statistical practices in hard sciences has also become something between a frustrating hobby and a career for some people. It's definitely not a competition between disciplines here. Improving a discipline does involve actually coming to terms with the magnitude of the problem, though.
posted by mark k at 9:50 PM on March 6, 2016 [1 favorite]

if a small change in experimental procedures or subjects ruins the effect, is it really a robust effect to begin with?

To be fair that's what we're supposed to do in the hard sciences too. If your result doesn't hold when you change models then it's not biology, it's just a quirk of that cell line or whatever. Validating in a different system or with a different method should always be step two of any successful study. (I'm a biochemist(-ish), not a protein biologist but I work closely with them)
posted by shelleycat at 10:56 PM on March 6, 2016 [1 favorite]

It looks like the latest major result to get clipped by failure-to-replicate is Baumeister's famous ego-depletion effect. You've probably heard of it: the basic idea is that exercising willpower strenuously to do something like resist temptation (to eat some delicious cookies, for instance) makes it much harder to perform later tasks (like solving a puzzle). Slate has a write-up here; the original paper is forthcoming and still seems to be embargoed.

Also, there's a nice reading list for people wanting to get up to speed on the replicability crisis, compiled by Joe Hilgard.
posted by informavore at 4:08 AM on March 7, 2016 [8 favorites]

The difference between a confidence interval and a capture percentage

Evaluating a new critique of the Reproducibility Project

More on replication crisis
posted by Joseph Gurl at 3:28 PM on March 7, 2016

More responses:

Authors ask Let's Not Mischaracterize the Replication Studies.

Gelman again here.

The first link I found especially worth reading. The authors provide a convincing and very clear explanation of why apparently huge changes (from "Thinking about military service" to "Thinking about a honeymoon") were made and were a sincere attempt to replicate.
posted by mark k at 7:29 AM on March 9, 2016

It looks like the latest major result to get clipped by failure-to-replicate is Baumeister's famous ego-depletion effect.

Heh, I made a FPP about decision fatigue several years ago. Damn you, Baumeister!
posted by homunculus at 12:45 PM on March 9, 2016

Dan Davies argues that "people who expect big things from evidence-based approaches ought to be really quite worried right now." and "Though there are two facets to the reproducibility problem, there is only one that is worth solving."
posted by hawthorne at 5:41 AM on March 22, 2016 [1 favorite]

« Older Watch THIS | Cognition without Cortex Newer »

This thread has been archived and is closed to new comments

MetaFilter

Replication study fails under scrutiny
March 6, 2016 3:56 AM Subscribe

Tags

Share

Replication study fails under scrutiny March 6, 2016 3:56 AM Subscribe

Tags

Share

Replication study fails under scrutiny
March 6, 2016 3:56 AM Subscribe