Wheel turnin' 'round and 'round
July 8, 2014 7:31 PM   Subscribe

Jason Mitchell, a scientist in the Harvard Social Cognitive and Affective Neuroscience Lab recently published an essay on his website titled "On the emptiness of failed replications". In the essay he makes several controversial arguments, the most notable of which may be his assertion that studies designed to replicate previous work have no inherent scientific merit:
Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.

He also expresses the opinion that, "...authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues." However, there is general agreement in the scientific community at large that there is a need to systematically improve the way in which science is done, and there are concerns about reproducibility (or the lack thereof) in social psychology as well as other fields.

As research techniques and protocols grow in complexity, so to does the challenge of replication. It is frequently argued that failed replications are often interpreted as evidence of incompetence or malfeasance on the part of the original experimenters, but investigators have also argued that is evidence of inability on the part of the replicators. Pete Etchell argues in "The Guardian" that the solution is two-fold: scientists need to develop a thicker skin and remind themselves that replication is an integral part of the scientific method, and practitioners and the public alike need to be taught that failures of replication simply mean that the results of the new experiment did not match the original experiment.

Previously: the Many Labs Replication Project.
posted by wintermind (34 comments total) 23 users marked this as a favorite
 
Neuroskeptic talks about some flaws in the "On the emptiness of failed replications".

On "On the emptiness of failed replications"
posted by motorcycles are jets at 7:35 PM on July 8, 2014 [4 favorites]


That's a good link. I'm sorry I missed it, but I'm glad you shared it. Thanks!
posted by wintermind at 7:39 PM on July 8, 2014


Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way.

...or, equally, that the original experimenter bungled something along the way, surely? Or did not correctly describe their actual methodology?
posted by Jimbob at 7:40 PM on July 8, 2014 [19 favorites]


Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way.

Why would this be any more likely than the possibility that the original experimenter bungled something along the way?

And by the time he's writing things like this:

Experimenters develop a sense, honed over many years, of how to use a method successfully. Much of this knowledge is implicit.

He's explicitly appealing to something besides the scientific method. This is pretty much ad hoc rationalization, isn't it? And when he moves on to insist that null results of replications have to provide alternative positive explanations, he shifts the burden of proof almost entirely.

This all seems like a good recipe for bad science.
posted by kewb at 7:41 PM on July 8, 2014 [20 favorites]


"Unless original experiments are conducted by flawless experimenters, nothing interesting can be learned from them."

FTFY.

If a replicator can make a mistake because the method is so brittle, so can the original experimenter. The flaw is assuming that the original work was flawless, which is unproven.

Therefore, the self-correcting nature of reproducible results, however difficult, remains a cornerstone of science.

Now, the practical issues with reproduction are legion. Money is scarce, and reproduction is not sexy (pun intended). And, as we are told here, the work is hard and tricksy.

Regardless, this argument is the weakest sauce.
posted by clvrmnky at 7:44 PM on July 8, 2014 [3 favorites]


Time to link my favorite scientific journal: Journal of Articles in Support of the Null Hypothesis
posted by the man of twists and turns at 7:44 PM on July 8, 2014 [11 favorites]


He's explicitly appealing to something besides the scientific method. This is pretty much ad hoc rationalization, isn't it?

Yes. The scientific method provides a means to explain and understand the behavior of the wider universe. If you repeat an experiment and obtain two different results, then clearly the wider behavior of the universe is not being captured correctly by the model system - someone is doing something wrong, there's a variable that's not being controlled. Scientific experiments have to have the potential for replication for this reason; it's why papers have Materials and Methods sections. Saying "Just default to whatever the first guy found because after all he's probably the expert and has an intuitive understanding of how he did his work" is pretty close to saying there are phenomena out there that can't be interrogated by the scientific method, and that appeal to authority is a justified practice.
posted by Jimbob at 7:47 PM on July 8, 2014 [10 favorites]


Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way.

That is a pretty bold claim to make, for which I note he does not present a single shred of evidence in support.
posted by mhoye at 7:47 PM on July 8, 2014 [7 favorites]


This has the sound of running scared.
posted by grobstein at 7:52 PM on July 8, 2014 [1 favorite]


The idea may be that it's too easy to make news by staging a crappy replication of an experiment and have it fail. That's a point worth making, perhaps, but the essay as a whole seems pretty dubious.
posted by uosuaq at 8:02 PM on July 8, 2014 [1 favorite]


I started to read this a couple of days ago and got as far as the recipe analogy and my eyes began to roll too hard for me to keep reading.

I just tried again and the same thing happened, so...do I win something for replicating previous results? I think I should.
posted by rtha at 8:03 PM on July 8, 2014 [8 favorites]


Two things, neither of which is new.

First, many failures to replicate start with good theoretical reasons for being skeptical of the original experimental result. Experimentalists in such cases make predictions about specific ways in which replications are expected to fail, and when they find them, they have good reason to think that the original experiment was bungled. (Also, often these replications and failures to replicate are replications in a very broad sense -- where the experimentalist tries to get at the alleged phenomenon from lots of new angles.)

Second (and much more importantly), even if it is true that the most likely explanation of a single failed replication is that the replicating experimentalist bungled something, the most likely explanation of several failed replications, conducted by different experimentalists in different laboratories using different subjects and different equipment, is that the original experimentalist bungled something. Or, more typically in social science, that the original experimentalists were unlucky in their sampling. The way out of this is not to fear replication but to do a lot of it. And then do meta-analyses.

Compare the attitude expressed here -- that you can only learn something from a perfect experimentalist -- to the attitude in astronomy around 1900, when statistical techniques were first being brought to bear on serious scientific problems. Many astronomers thought that the way to get the best estimate of the location of a comet, for example, was to take the best observations and ignore the rest. But they were wrong. Even not-very-good observations are informative, and putting together a lot of not-very-good observations with the right statistical tools can lead to a very reliable inference. As C.S. Peirce put it: Science is not like a chain that is only as weak as its weakest link but like a cable woven of innumerable slender threads.
posted by Jonathan Livengood at 8:04 PM on July 8, 2014 [22 favorites]


Looks like we owe Fleischmann and Pons a massive apology. We could have had room temperature fusion in the eighties if it weren't for the naysayers and nullifidians.
posted by Iridic at 8:10 PM on July 8, 2014 [6 favorites]


Whoops, that should have been 1800, not 1900, in my comment above. (Think Laplace, not Pearson.)
posted by Jonathan Livengood at 8:15 PM on July 8, 2014


Nobody expects the Replication Inquisition (p <.05).
posted by srboisvert at 8:17 PM on July 8, 2014 [7 favorites]


This is why journals should demand that full data -- enough to reproduce the result -- be published along with articles. The hard part is data collection; taking that out of the picture only leaves the all-too-common problem of bad statistics (sometimes pitifully so). Sometimes just errors, sometimes the wrong conclusions, sometimes the wrong tests.

For the love of Crick, people, correct for multiple comparisons.
posted by supercres at 8:24 PM on July 8, 2014 [1 favorite]


This is why journals should demand that full data -- enough to reproduce the result -- be published along with articles.

I agree, completely, but I have trouble convincing the satellite imagery company I signed a data agreement with and paid $15,000 for their images of that.
posted by Jimbob at 8:30 PM on July 8, 2014


Likewise, there is more to being a successful experimenter than merely following what’s printed in a method section. Experimenters develop a sense, honed over many years, of how to use a method successfully. Much of this knowledge is implicit. Collecting meaningful neuroimaging data, for example, requires that participants remain near-motionless during scanning, and thus in my lab, we go through great lengths to encourage participants to keep still. We whine about how we will have spent a lot of money for nothing if they move, we plead with them not to sneeze or cough or wiggle their foot while in the scanner, and we deliver frequent pep talks and reminders throughout the session.

It is amazing this guy is a professor. Yes, the methods section is too small to contain the total knowledge necessary to run an experiment. If you cannot replicate a finding, you contact the original lab who published the paper and run through all the possible deficiencies in the published method, and if necessary, bring in someone who did the original work to fill in the blanks, then replicate again. If you still can't get a similar result, the original conclusion may still be true, but is too specific to be useful. Science is about making implicit knowledge explicit and usable in the general case; implicit knowledge is for shamans and phrenologists.
posted by benzenedream at 8:39 PM on July 8, 2014 [12 favorites]


This is why journals should demand that full data -- enough to reproduce the result -- be published along with articles.

I and my 3 TB of data for one article would like to have a word with you about storage and transmission costs.


... That said, I am subject to NSF requirements to provide all that data upon request, which I totally agree with. But publishing it along with the paper? No one actually wants to deal with the logistics of that.
posted by dorque at 8:41 PM on July 8, 2014 [2 favorites]


Aww crumb, if all those failed replicators of cold fusion had just gone on to invent their own experiment we'd all be driving around in CF powered cars by now!
posted by sammyo at 9:25 PM on July 8, 2014


Does all this mean that we have no defense against spurious claims .... Suppose someone claims to have seen a paisley swan ... If I am to dislodge this claim, it won’t do simply to scare up several white swans. Instead, I must provide a positive explanation for the observation: how did the report of a paisley swan come to be? ..... [6].

I'd suggest that footnote six somewhat refutes his own paper.

This sounds like he's trying to get at some kind of problematic behavior in his narrow field, but direct accusations would be death to his career so go for the overly general screed.

But I'm hoping to run onto some paisley swans, probably from genetic manipulation, but paisley swans, cool.
posted by sammyo at 9:48 PM on July 8, 2014


Soooo tempted to spray paint a couple and pahhk them in the yaahrd.
posted by sammyo at 9:53 PM on July 8, 2014


"Unless original experiments are conducted by flawless experimenters, nothing interesting can be learned from them."

I'm with the article's author on this completely. Off the top of my head, I can think of a few interesting things that came about due to the result of a botched experiment or an unrelated discovery that came about that was outside the intended goal of the experiment - penicillin, vulcanized rubber, microwave ovens, teflon, pacemakers, and about half a dozen things from coal tar, including dyes and waterproof sealants.
posted by chambers at 10:03 PM on July 8, 2014


This is why journals should demand that full data -- enough to reproduce the result -- be published along with articles.

The costs of storage and transmission are definitely a barrier, sure. So is metadata. Data sharing is hard - there are a lot of costs related to making data understandable and usable by other scientists. There is a lot of interesting work being done in this area as we speak.

But there is something else in his premise that is fatally flawed. Replication is already not of interest to scientists doing naturalistic research; e.g., social scientists like social psychologists (his audience). Naturalistic inquiry is focused not on external validity, internal validity, reliability, or generalizability; instead, naturalistic researchers use different verification techniques to establish trustworthiness of their data and findings: credibility, transferability, dependability - which is closely related to replicability but is not the same thing - and confirmability. Yvonna Lincoln and Egon Guba wrote a fascinating (and unfortunately expensive) book called Naturalistic Inquiry in the 1980s that covers this in great detail. Perhaps Jason Mitchell should get his hands on a copy. The authors argue that data audits are the best way to ensure dependability and confirmability (the two types of trustworthiness in naturalistic research that are similar to replicability), and instruct researchers to keep detailed notes not just on how they reach their findings, but on the research process itself. Formal audits can be employed by outside scientists looking at the data and these detailed notes in order to assess the appropriateness of decisions made during the inquiry process.
posted by sockermom at 10:36 PM on July 8, 2014 [4 favorites]


Is he trolling? Maybe it's a big trick by Mitchell to see who he can get to publicly agree with this paper, only to reveal them as a fraud. Like the Sokal hoax. Does anyone agree with this?
posted by Edgewise at 12:06 AM on July 9, 2014 [1 favorite]


And this is why you need people who study philosophy of science, kids.
posted by Elementary Penguin at 1:46 AM on July 9, 2014 [2 favorites]


I and my 3 TB of data for one article would like to have a word with you about storage and transmission costs.

I and my 10 TB of data (self link, sorry) would suggest you take a look at any of the open-data consortiums (consortia?) springing up to do exactly this, getting funding from NIH (for example) and NSF to make this a reality. We are hosting on Box temporarily, but when we set out to do this, I was amazed that we had choices for sharing raw neural data.
posted by supercres at 4:31 AM on July 9, 2014 [6 favorites]


Another example, funded jointly by NSF and NIH. I wouldn't have guessed that neuroscience is unique, but I guess I could be wrong.
posted by supercres at 4:35 AM on July 9, 2014


Oh hey, that's awesome. That doesn't exist for my field as far as I know, but I wish it did. (It's actually a bit more complicated on our end -- we're a user facility, and it's not clear from our NSF mandate how much of our users' data we're required to hang on to and for how long, so ultimately my 3 TB is a drop in the bucket. I know we're talking about how to make the hosting work, but I don't think a good solution has emerged yet.)
posted by dorque at 6:49 AM on July 9, 2014


Would it be possible to self-host large datasets in some kind of read-only storage system and then provide another system locally where other researchers could run data analysis programs on the data?

There would be questions about how to describe the data format, but if the researchers use well known formats like HDF, at least access is via well known and easily available

Researchers could use things like docker and virtual machines for security. It's pretty cheap to make a basic docker image, and spin it up and down when needed, then delete when the remote researcher is done and as long as you trust that docker won't let someone escape the sandbox should be reasonably secure. Since the data access would be local, it wouldn't incur the kinds of data costs involved mailing the data out.

Archiving the data sets on tape and letting someone who wants access to pay for duplication and mailing costs is also a possibility.

In both cases, there are questions about how long the data set needs to be available as the storage being used might be needed for the next experiment - so that would have to be a part of any funding request - a problem that funding agencies might not appreciate.
posted by Death and Gravity at 9:03 AM on July 9, 2014


I keep trying to read more of this essay and being just astounded by how terrible it is.

He fails to even mention the statistical arguments that counsel distrust when there is one positive experiment in a field of "hundreds of other studies [that] have failed to obtain" the same effect (yes, the "hundreds" figure is his own!). This is to leave aside entirely the related concerns about multiple comparisons and p-hacking that are routinely documented places like Andrew Gelman's blog.

In fact, the essay seems to be entirely statistically naive. Why is that, how can it be? Does this guy, who runs his own lab at Harvard and basically does statistics for a living, really not get the statistical arguments of the replicationistas?

Or is this essay more like propaganda, written by someone with the sophistication to understand the debate, but with the purpose of shaping the opinions of the less sophisticated?

The intended audience is presumably other researchers in social psych (and perhaps colleagues in related disciplines and those in funding roles).

Mitchell is certainly flattering this audience: he is telling them that they have "golden hands," and that those who doubt their results are just less-skilled researchers with sour grapes.

And he is presuming that they don't know enough statistics to see the holes in his arguments. Can that be right? It makes this essay itself a microcosm of the replication problem: self-interest and a lack of statistical education stand in the way of institutional change to make science more reliable.
posted by grobstein at 9:10 AM on July 9, 2014 [4 favorites]


And this is why you need people who study philosophy of science, kids.

Oh, and, from a philosophy of science perspective, of course this essay wouldn't be complete without thoughtlessly copping some very old phil-sci dogma to prove that the activity under discussion isn't real science.

"Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output."

You have to be scrupulously neutral, I guess, and have no prior expectations, or it's not science. You can find this kind of stance in phil-sci writings. For example, it seems to be one of the cornerstone's of the great Claude Bernard's view in his Introduction to the Study of Experimental Medicine (1865). But it can't do the work Mitchell wants it to do.

Granting that a sort of fair-minded attitude towards experimental results, a willingness to suspend prior expectations, has a place in explaining the success of science, this can't be made a test of whether an activity counts as science. Of course many great scientific successes have occurred exactly because researchers had strong prior expectations and pursued them doggedly.

Rarely does a self-defeating argument defeat itself quite so quickly and obviously. Mitchell produces this gloss on Kuhn in the very same essay:
Kuhn argues that in the course of normal science, researchers typically conduct only those experiments for which they have strong prior expectations that certain phenomena should be observed; hence, they are generally reluctant to move down the list of plausible culprits when trying to locate a source of experimental failure.
In other words, the progress of what Kuhn called a "mature science" positively depends on a structure of shared "strong prior expectations" about experimental results. So it can't be un-science to have strong prior expectations.

In slightly more charity, Mitchell seems to be saying that science demands strong prior expectations specifically of experimental success, and can't tolerate strong prior expectations that given results may be spurious. This is a point that he attempts to argue for in the rest of the essay, I think quite poorly.

Kuhn's arguments are not prescriptive, anyway, they are descriptive -- they are attempts to broadly characterize the sociology of science. The second piece, the question of getting from the sociology of science to understanding why scientific knowledge is trustworthy, was never completed to Kuhn's satisfaction, and he died leaving an unfinished manuscript on the subject, a manuscript that can't be published because he refused to publish it without a satisfying argument.

So the argument from Kuhn is basically ill-posed.

But we might further ask: is social psychology a mature science, in a period of normal-scientific development? Prima facie it seems that it is not. Even if we accept that social psychology has ever been a mature science, right now it seems to be in crisis. There has been a string of embarrassments and high-profile dissents. We seem to be witnessing the collapse of the received consensus on what methods are reliable, what results are trustworthy, and so on. The replicationistas are trying to introduce a new paradigm to address the problems of the old tradition: published replications, published data, pre-registration of experimental designs, etc. So even if we take Kuhn's picture to be a prescription of how science should work, Mitchell's point fails.

Whatever way you look at it, Mitchell's argument falls apart.

Gonna stop ranting, maybe I will write something more full-formed about this.
posted by grobstein at 9:39 AM on July 9, 2014 [3 favorites]


I can hardly wait to get Jason Mitchell to invest in my perpetual motion device. After all, my experiments prove it works, so it's fine, right?
posted by happyroach at 9:50 AM on July 9, 2014


Researchers could use things like docker and virtual machines for security

There have been proposals to attach VMs in a central storage archive as part of supplementary materials. As long as the data sets themselves are not huge, and the number of OSes is small, deduping will minimize storage costs. It would at least make replicating an OS-specific bug in an analysis trivial five years down the line.
posted by benzenedream at 1:50 PM on July 9, 2014


« Older Three-Legged Recycling   |   Pizzendämmerung Newer »


This thread has been archived and is closed to new comments