Most Publish Research Findings are Probably False
January 24, 2014 8:09 AM   Subscribe

"Given the desire for ambitious scientists to break from the pack with a striking new finding, Dr. Ioannidis reasoned, many hypotheses already start with a high chance of being wrong. Otherwise proving them right would not be so difficult and surprising — and supportive of a scientist’s career. Taking into account the human tendency to see what we want to see, unconscious bias is inevitable. Without any ill intent, a scientist may be nudged toward interpreting the data so it supports the hypothesis, even if just barely."

paper here.

Ionnidis' follow up paper, "Contradicted and Initially Stronger Effects in Highly Cited Clinical Research", here.

TheEconomist article on this subject:

"“I SEE a train wreck looming,” warned Daniel Kahneman, an eminent psychologist, in an open letter last year. The premonition concerned research on a phenomenon known as “priming”. Priming studies suggest that decisions can be influenced by apparently irrelevant actions or events that took place just before the cusp of choice. They have been a boom area in psychology over the past decade, and some of their insights have already made it out of the lab and into the toolkits of policy wonks keen on “nudging” the populace.

Dr Kahneman and a growing number of his colleagues fear that a lot of this priming research is poorly founded. Over the past few years various researchers have made systematic attempts to replicate some of the more widely cited priming experiments. Many of these replications have failed. In April, for instance, a paper in PLoS ONE, a journal, reported that nine separate experiments had not managed to reproduce the results of a famous study from 1998 purporting to show that thinking about a professor before taking an intelligence test leads to a higher score than imagining a football hooligan."

Erik Voeten discusses how this relates to political science.
posted by MisantropicPainforest (44 comments total) 20 users marked this as a favorite
 
We've kind of done this before in 2010, and again, with an emphasis on the underlying statistical issues, in 2011. And that Economist link appeared here in October.
posted by escabeche at 8:21 AM on January 24, 2014 [5 favorites]


We've kind of done this before in 2010

We're replicating the results.
posted by yoink at 8:26 AM on January 24, 2014 [18 favorites]


I'm so convinced by these research findings that I can't possibly believe them.
posted by three blind mice at 8:36 AM on January 24, 2014 [4 favorites]


"Most Published Research Findings are Probably False" because scientists are only fishing for lucrative grants. (And hamburgers.)
posted by monospace at 9:00 AM on January 24, 2014


This really highlights to me how important it is to enforce a clear and honest methods or study design section in all papers and proposals as a peer-reviewer. Methods frequently, in my experience leave out factors, and steps, sometimes consciously, but more often because the authors did not think to write them down, as they "weren't that important", or were struggling with reproducibility, or something. An idealized version of the method gets written, not the actual method with all the messy steps.

I am involved in physics/hydrodynamics simulations, with cross-overs into chemistry and toxicology. The field is highly multi-disciplinary, the measurements complex. It's incredibly easy to cause effects beyond what the experimenter intends and to introduce apparently chaotic results. In hydrodynamics, for example, the effects of the vessel/tank walls are often the dominant factor in the result. There is a large body of literature in my field, stretching back decades, that is shot through with this particular problem. It's only been in the last decade or so that these problem have really started to be addressed, and the effects of the apparatus used really understood. However, as interest in the field runs hot and cold, driven by current events, studies are often conducted by groups without experts in all aspects of the work, and whose grasp of this complexity is... tenuous. And so a lot of bad science can result.

The implications in my field, for the real world, can be significant and long lasting. There's a body of literature in my discipline that says just about anything you want it to say, because initial conditions and side-effects of study designs were not well understood. Doing literature reviews, I frequently have to discard at least a quarter of all of the published papers I look at as being useless, simply because their methodology is flawed. My rejection rate for peer-review submissions is higher, probably more like 50%.

But we still have to live with the legacy of this work. Some of the older, problematic tests are written into US (and other countries) regulations. Planning and risk assessment has to deal with decades-long legacies of decision making based on impaired test regimes.

The way the system of science is currently structured, this all comes back to how well the peer-reviewers do their jobs. That's where most of these problems are supposed to be caught. I do think that the current system of two to three anonymous reviewers working in private (under deadlines of a week or two) does not allow adequate consideration of submitted work. The physics arXive post-and-review system is much better, IMO, and draws from a much wider talent pool. To paraphrase the IT slogan, methodological problems are much more shallow when there are hundreds of eyes rather than two or three.
posted by bonehead at 9:02 AM on January 24, 2014 [6 favorites]


Doesn't that headline suggest that this study may very well be false?
posted by GenjiandProust at 9:08 AM on January 24, 2014


It was a nice epoch, but it looks like it's time to pack up the science and go back to reading chicken entrails.
posted by cmoj at 9:18 AM on January 24, 2014 [2 favorites]


Who am I suppose to believe these days?
posted by gucci mane at 9:18 AM on January 24, 2014


Doesn't that headline suggest that this study may very well be false?

That's the problem with all the articles in The Scientific Cretan.
posted by yoink at 9:20 AM on January 24, 2014 [6 favorites]


Even if this is an accurate description of academia, it doesn't strike me as a necessarily bad state of affairs. A bunch of biased renegades all at war with one another, leading to a series of "train wrecks" and paradigm shifts, might in the long run be a better system for discovering truths than a slow, safe, dispassionate, and conservative system.
posted by painquale at 9:34 AM on January 24, 2014 [1 favorite]


Alternatively the original research data was collected and processed with proper controls for "unconscious bias" and it was only because of the unusual results that the paper went on to be highly cited.

If you want to attack someone's findings then look at their data and methods and show exactly where they went wrong. The fact that you can't reproduce their experiment is interesting, but usually is more likely to indicate an unexpected variable in the data than bias on the researcher's part.
posted by Tell Me No Lies at 9:35 AM on January 24, 2014


I have trouble taking serious any piece that is so unfaithful to its source material:
He and his colleagues could not replicate 47 of 53 landmark papers about cancer. Some of the results could not be reproduced even with the help of the original scientists working in their own labs.
As I said in one of the previous threads, this description is not accurate at all; Begley et al were not trying to replicate results, they were trying to generalize results to much broader conditions. To be honest, we don't really know what they did because they never bothered to say what exactly they did beyond saying that it was a generalization of the original experiments.

Further, we don't know what they meant by 'landmark' papers, but to be 'landmark' in cancer biology, you're usually going to be showing a new type of behaviour that's never been seen before. As in, wow, we found that this cell line is very dependent on the dosage of gene X! Nobody thought that gene X could be related to cancer before, because of broad assumptions of gene X's function. This can still be a landmark finding, showing a new type of behavior, and not be widespread, and even though its true and useful for researchers, not sufficient for the purposes of a pharma developing a drug in the early 2000s.

That was the point of the Amgen editorial: that pharma can't immediately use 47 of 53 studies, not that they didn't replicate. It's that they didn't generalize enough to drive a drug program. They were urging researchers to focus on publishing papers that are more immediately translatable into drugs, not saying that most research is false, which is what the press made of it.

On the other side of this piece, I love Mina Bissell's work entirely, and am very excited to see her cited. And she's very right, cell lines are tricky, and inability to get epithelial cell lines to behave the same as they did in another lab has has torpedoed a few attempts at validating my computational predictions. Mina Bissell is best known to me for showing how cell lines completely change behavior when they grow in 3D structures with proper cell-cell contact, compared to growing in disordered configurations. She gives great talks if you ever get a chance to see her give a seminar.
posted by Llama-Lime at 10:28 AM on January 24, 2014 [4 favorites]


This is such a mess. There are several somewhat-related claims being made:

1. There are systemic incentives that may bias researchers.

It's stupidly facile to suggest simply that because such incentives exist in a potential sense, they necessarily create bias. That doesn't prove or establish anything at all; it would be just as valid to say that researchers are biased toward scrupulous care in research because producing bad work can derail a career completely. And the un-cited claim that there are "weak standards" for statistical significance (what, throughout all of modern science? is that really being argued without a shred of data?) is gibberish, in the sense that its meaning is so unclear that it doesn't mean anything as stated.

Also, the NYT article says that the effect of this phenomenon is amplified by disappearing grant money, which assumes what Ioannidis is trying to prove! There's no effect which has been demonstrated!

2. The reporter notes after a bunch of hand-wavey, sophistic "that sounds totally plausible" reasoning that Ioannidis built a mathematical model which supports the conclusion that most published findings are false.

This is utter garbage. If Ioannidis has a conclusion before he has done any actual analysis or research, he's a crackpot, not a scholar. He has a hypothesis that most published findings are false.

3. Ioannidis notes that some important studies are contradicted by later work.

This is not a scandal. This is how science works, full stop. The important work to be done here is to figure out how the contradictions can help us understand what is really going on.

4. Irreproducible results.

This is the only claim from this whole story that comes anywhere close to being important. If Ioannidis wants to contribute to science instead of airily dismissing it, he should focus on the problem of irreproducibility. Is there something wrong with our basic scientific model? Or are the effects people study now just so small and contingent that our means of measurement aren't up to the task? These are the questions which need attention, it seems.

Frankly, this whole thing seems less like a radical enactment of science's self-correcting nature and more like axe-grinding against academic science. It's especially perverse and totally irresponsible to argue that depriving researchers of funding is making them produce bad science without also pointing out that simply improving funding opportunities and making careers in academic science more viable would, by this reasoning, have the effect of making science more reliable and accurate.

If you want to attack someone's findings then look at their data and methods and show exactly where they went wrong. The fact that you can't reproduce their experiment is interesting, but usually is more likely to indicate an unexpected variable in the data than bias on the researcher's part.

I don't think this is exactly right. If you can find demonstrable mistakes in someone's data and methods, that's certainly important, but a research project could have everything right on paper and still fail to yield the experimental outcomes that were originally reported, which is what we're talking about here I think.

If experimental results can't be duplicated, it's not so easy to know exactly what went wrong, though many extant methods exist to diagnose problems. You can use statistical techniques to help assess how likely it is that your mathematical model has an omitted independent variable, for example, but the inability to reproduce a result per se doesn't tell you whether the problem is with the model, the measurement, or something else.
posted by clockzero at 10:34 AM on January 24, 2014 [4 favorites]


Even if this is an accurate description of academia, it doesn't strike me as a necessarily bad state of affairs. A bunch of biased renegades all at war with one another, leading to a series of "train wrecks" and paradigm shifts, might in the long run be a better system for discovering truths than a slow, safe, dispassionate, and conservative system.

I think there's a deeper philosophical problem here, especially in the social sciences and other areas where mechanisms are highly complex and noisy. Most social science literature that I've read implicitly assumes that effects are either "real" or "not real," and that statistical tests serve the purpose of separating the wheat from the chaff. Social scientists find an interesting pattern in the data but "correlation isn't causation" so out comes the advanced statistical toolkit that took years to master, and look the p-value is still really low so this is a real genuinely scientific result. And once you get over that hurdle we can now talk about this result as if it's been proved by science, it's a fact, we can cite it.

Maybe this gives researches the right incentives to arrive at the truth, but if the truth is that effects are always situation-dependent and can vary considerably between individuals and times and places, I don't see who has the incentive to "discover" that. It seems like there is much more incentive to spend time publishing overly strong results that may seem to flat-out contradict other overly strong results.
posted by leopard at 10:38 AM on January 24, 2014 [1 favorite]


However, as interest in the field runs hot and cold, driven by current events

i see what you did there
posted by rebent at 10:38 AM on January 24, 2014 [1 favorite]


I think there's a deeper philosophical problem here, especially in the social sciences and other areas where mechanisms are highly complex and noisy.

This is an overly-broad generalization. "Noise" is a term with a specific and well-defined meaning in signal processing, but its meaning is less formal and precise when discussing data more generally; in any case, though, it is data that is noisy, nor mechanisms or effects. That's an important distinction.

Most social science literature that I've read implicitly assumes that effects are either "real" or "not real," and that statistical tests serve the purpose of separating the wheat from the chaff.

I'm not sure why you're putting scare quotes around "real" here, but yes: broadly speaking, social science is concerned with evaluating claims, facts, and hypotheses about the social world in order to come to conclusions about what is demonstrably real and true, and statistical analysis is one tool we use. All of science has the same goal with different objects of inquiry. I'm not sure how you think this distinguishes social science from physics or chemistry or what have you.

Social scientists find an interesting pattern in the data but "correlation isn't causation" so out comes the advanced statistical toolkit that took years to master, and look the p-value is still really low so this is a real genuinely scientific result.

I understand all of these words, but as a social scientist, I have no idea what this is supposed to mean about social science or what you're trying to say.

And once you get over that hurdle we can now talk about this result as if it's been proved by science, it's a fact, we can cite it.

So scholars shouldn't (always provisionally) accept the veracity of tested and reviewed findings? We should be equally skeptical about all claims, regardless of any testing or evaluation, and refuse to accept mere evidence as evidence of anything? That seems like an unreasonable standard, to put it politely.
posted by clockzero at 10:57 AM on January 24, 2014 [1 favorite]


Most social science literature that I've read implicitly assumes that effects are either "real" or "not real," and that statistical tests serve the purpose of separating the wheat from the chaff. Social scientists find an interesting pattern in the data but "correlation isn't causation" so out comes the advanced statistical toolkit that took years to master, and look the p-value is still really low so this is a real genuinely scientific result. And once you get over that hurdle we can now talk about this result as if it's been proved by science, it's a fact, we can cite it.

And what's the problem here?
posted by MisantropicPainforest at 11:03 AM on January 24, 2014


I can see where this is going to go once the anti-science types get hold of this:

"The Scientific Establishment admits that their studies are BIASED and FAKE! This proves that evolution/vaccination/relativity/climate change is FALSE SCIENTIFIC DOGMA!!! My theory of creationist perpetual motion ancient astronauts from the hollow earth is COMPLETELY VINDICATED!!!
posted by happyroach at 11:09 AM on January 24, 2014 [1 favorite]


The Ioannidis article strikes me as being extremely hypocritical, given that he also has an incentive to be biased and write an article that supports his hypothesis (and that creates controversy and gets his name published). If his hypothesis is assumed to be true, then we can conclude that his result is false.
posted by StrangerInAStrainedLand at 11:38 AM on January 24, 2014


And what's the problem here?

It has been empirically shown that drug researchers generally have enough flexibility in their research protocols to show any result from statistically significantly harmful to statistically significantly helpful. See the summary of David Madigan's work with OMOP here.

So that's a problem (which is very large and expensive). In general, in a world where people have a hard time dealing with uncertainty, statistically trained scientific researchers may actually be making things worse, because they have incentives to brush uncertainty under the rug in the name of scientific rigor.

So in August 2010 this paper gets published:
CONCLUSION: Among patients in the UK General Practice Research Database, the use of oral bisphosphonates was not significantly associated with incident esophageal or gastric cancer.
And in September 2010 this paper gets published:
CONCLUSION: The risk of oesophageal cancer increased with 10 or more prescriptions for oral bisphosphonates and with prescriptions over about a five year period. In Europe and North America, the incidence of oesophageal cancer at age 60-79 is typically 1 per 1000 population over five years, and this is estimated to increase to about 2 per 1000 with five years' use of oral bisphosphonates.
posted by leopard at 12:05 PM on January 24, 2014 [1 favorite]


Correct me if I'm wrong, but you were talking about social science literature, and the articles you mention aren't social science.
posted by MisantropicPainforest at 12:23 PM on January 24, 2014


It has been empirically shown that drug researchers generally have enough flexibility in their research protocols to show any result from statistically significantly harmful to statistically significantly helpful. See the summary of David Madigan's work with OMOP here.

The flexibility you refer to is based on using different databases, first of all, and secondly epidemiology is really kind of on the border between social science (it's not representative of that large branch of science itself, necessarily) and medicine. Madigan's work is certainly interesting, but what it shows is that extant methods of prediction aren't perfect, not that there's something wrong with epidemiology or social science themselves. You seem to have misunderstood that.

So that's a problem (which is very large and expensive). In general, in a world where people have a hard time dealing with uncertainty, statistically trained scientific researchers may actually be making things worse, because they have incentives to brush uncertainty under the rug in the name of scientific rigor.

I'm amazed at how sanguine people are about assuming that the mere putative existence of incentives is enough to assume that professional scientists are categorically dishonest. That's just utterly credulous and foolish. Aren't you aware that this Madigan fellow is also a researcher? Why would you listen to anyone if you think that imagining a possible incentive to be dishonest constitutes evidence of dishonesty? If you can't trust any study, why would it matter if they contradict each other?
posted by clockzero at 12:29 PM on January 24, 2014


Maybe this gives researches the right incentives to arrive at the truth, but if the truth is that effects are always situation-dependent and can vary considerably between individuals and times and places, I don't see who has the incentive to "discover" that.

Academic fields in general, and scientific fields in particular, have a tendency to multiply. When someone happens on a previously unnoticed independent variable (eg. person, time, place) on which previously nonsense data turns out to depend, they get to found a new field.

It's the kind of result that looks good on a resume.
posted by LogicalDash at 12:52 PM on January 24, 2014


Correct me if I'm wrong, but you were talking about social science literature, and the articles you mention aren't social science.

I wrote up above: "in the social sciences and other areas where mechanisms are highly complex and noisy."

The flexibility you refer to is based on using different databases

Among other items. The two papers I linked above actually used the same database. (I also accidentally used the same link for both, the first one should go here.) But the fact that you get different results when you use different databases or different research protocols is sort of a big deal. Consumers of drug research are looking for generally applicable results, not results that may completely reverse themselves if applied on a different population.

it shows is that extant methods of prediction aren't perfect, not that there's something wrong with epidemiology or social science themselves. You seem to have misunderstood that.

So there's nothing wrong with social science or epidemiology, just with their methods? And it's not that "there's something wrong" with the methods, it's just that "they aren't perfect"? Yeah, I guess I don't understand the importance of these distinctions without differences. I thought we were having a discussion, not playing some kind of technical legal game.

I'm amazed at how sanguine people are about assuming that the mere putative existence of incentives is enough to assume that professional scientists are categorically dishonest. That's just utterly credulous and foolish.

I'm amazed at how sanguine some people are at taking my words and jumping to the conclusion that I am "assuming that professional scientists are categorically dishonest." Where did I say or even imply that? That's just utterly uncharitable and foolish.
posted by leopard at 1:29 PM on January 24, 2014


Dr. Ioannidis reasoned, many hypotheses already start with a high chance of being wrong. Otherwise proving them right would not be so difficult...

This seems a lot like arguing that most mountains don't exist, otherwise climbing them would be easy!

Measuring things - especially things that are rare (take leopard's example - how much difference is there between the first study's 1 and the second study's 2 when you dump a load of real world noise on top of a phenomenon where the baseline occurrence is somewhere between once every 2500 and 5000 man years. When you're close to your quantitation limit things become undependable.

Also, based on this reasoning, ruggedness testing of analytical methods should be an absolute blow off. Instead, it's a royal pain in the ass (unless, of course, the results are typically invalid) But, in almost 20 years of being the guy who had to figure out why the the little analytical caboose went off the rails, I don't think it was ever something we looked at in ruggedness testing.
posted by Kid Charlemagne at 2:00 PM on January 24, 2014


When someone happens on a previously unnoticed independent variable (eg. person, time, place) on which previously nonsense data turns out to depend, they get to found a new field.

It's the kind of result that looks good on a resume.


You don't get to found a new field if your research shows that some effect may or may not exist in any particular situation and is highly likely to largely depend on variables that you can't consistently measure. This is basically a state of "knowledge" you can achieve without science, it just involves throwing your hands up and shrugging.

To talk about social science a bit, in recent years/decades economists have branched out to apply their toolkit to social phenomena that are not traditionally considered part of economics (think Gary Becker and all the Freakonomics-type stuff that flowed out of his work). They've published papers in epidemiology, sociology, history, etc. I haven't systematically counted but I'm pretty sure very few papers have been published in which an economist declares that the economic toolkit doesn't really add much value to the study of a subject, and quite a bit more in which they've declared that economics is an exceptional subject that can really advance our knowledge. This is not because economists are horribly dishonest evil people, it's because people face incentives and behave accordingly.

This doesn't mean the system is "broken" or whatever. You make progress by taking strong assumptions and seeing how far you can get with them. You don't make progress by taking an avenue of potentially promising research and saying "eh, this is probably not going to work, so I won't bother." But the system doesn't magically weed out all the crap either. It's like anything else in life, it stumbles along and does some things well and other things poorly and over time a lot of things tend to get better but not necessarily.

My original point in this discussion was that people are bad at dealing with uncertainty. Stuff like "it is data that is noisy, nor mechanisms or effects. That's an important distinction." makes me feel more confident that this is true, because I don't see a distinction between saying that "the data on the effect of poverty on personal development is noisy" and "poverty affects development through a noisy mechanism." We are never going to have enough data to get a "pure" estimate of the effects of poverty. Poverty isn't a fundamental law of physics, it's a fuzzily-defined mental construct whose effects will always depend on things that we won't be able to consistently measure. So yeah, if your philosophy is that you can accumulate a large sample of people and run a couple of (controlled or natural) experiments and anything that emerges with a p-value < 0.01 is a scientifically proven fact, then you will always be surprised that so many scientific facts contradict each other. The robustness of a finding in social science does not lie in a p-value, it lies in consistency of results across a variety of approaches to measurement.
posted by leopard at 2:18 PM on January 24, 2014


So there's nothing wrong with social science or epidemiology, just with their methods?

The exercise you linked to which Madigan undertook was attempting (as I understood it) to quantify how good some basic predictors used in epidemiology are by using a massive amalgamation of longitudinal data in which the outcomes are already known. They found that their predictors are not perfect, and could in fact be substantially better. That suggests that epidemiology has a ways to go in its capacity to make extremely consistent and accurate predictions, but it doesn't necessarily mean that there's any problem with the field itself. It's only within the last decade or so that we've even been able to utilize data on this scale, so we can now leverage that capability, one hopes, to improve methods.

And it's not that "there's something wrong" with the methods, it's just that "they aren't perfect"?

I don't know who you're quoting there, but I didn't say that. Clearly there's a problem here, and the proximate cause (apparently) is that the ways epidemiologists predict disease or adverse effects is not perfect and could be improved upon.

Yeah, I guess I don't understand the importance of these distinctions without differences. I thought we were having a discussion, not playing some kind of technical legal game.

Well, fine, whatever you say. The distinction between "current methods used in epidemiology for prediction have a lower success rate than previously thought," on the one hand, and "statistically trained researchers are making things worse (?)" by deliberately manipulating their results, on the other, seems pretty bright and unmistakable to me. If you're talking about scientists working in the employ of drug companies massaging their data to get results their paymasters want, then I would concur that such quasi-unethical practices are bad science, but that's a totally different thing than the link about Madigan. Totally different.

My original point in this discussion was that people are bad at dealing with uncertainty.

Pretty broad empirical claim, but let's grant that it's true for the sake of conversation. People, in general, are "bad at dealing with" uncertainty.

Stuff like "it is data that is noisy, nor mechanisms or effects. That's an important distinction." makes me feel more confident that this is true, because I don't see a distinction between saying that "the data on the effect of poverty on personal development is noisy" and "poverty affects development through a noisy mechanism."

It's not very coherent to say "the data on the effect of x on y is noisy". Data from a particular study might be "noisy" but if all extant data on some relationship were "noisy," it would suggest there were problems with measurement, study design, or something else. Furthermore, saying that a mechanism is "noisy" is plainly incoherent. Do you have a background in data or statistical analysis? Because what you're saying comes across as ill-informed, to me, and I'm wondering if it's me or where you're coming from that's causing that.

We are never going to have enough data to get a "pure" estimate of the effects of poverty.

And you can see into the future! That's really remarkable.
posted by clockzero at 2:40 PM on January 24, 2014


QED ipso facto cigarettes are harmless and global warming is a lie. Now watch me hit this drive, yeehaw!!!!
posted by humanfont at 4:19 PM on January 24, 2014


Why wouldn't you read about Madigan's work before lecturing the person who posted a link to it? Madigan's research does not simply show that existing methodologies are poor at making predictions; it shows that research outcomes are highly sensitive to largely undebated research protocols.

There are a few ways to respond to this. Personally, I think this is a reminder that there may not be such a thing as "the" effect of a drug. This is obvious in some sense (drugs have different effects on different people) but can be easy to forget in a world where people want to believe things like "this drug causes cancer" or "this drug doubles your risk of cancer" because more complex messages are so ambiguous as to undermine confidence.

But I guess one could also respond by basically concluding that epidemiological methods suck, and that if the methods were better (maybe if the practitioners were smarter?) this problem would go away. Clearly someone who would bring up this statistical research in the middle of a discussion of how "most published research findings are false" is some anti-science quack and probably doesn't have any statistical training.

Then when this person helpfully quotes you in italics using the phrases "there's something wrong" and "not perfect," it would only be reasonable to wonder why there are quotation marks there.

Look, I'm not sure why a social scientist would find this hard to understand without willful superiority being an issue, but I'll try to choose my words carefully. Social scientists study questions like "How does growing up poor affect your life?" These terms are fuzzy and not well defined so for research purposes they get translated into questions like "what is the optimal coefficient for variable X in this regression of current earnings on this set of predictors" where X is some measure of parental wealth during childhood. Calculating this optimal coefficient is not just a matter of solving a mathematical optimization problem, but also involves addressing concerns that there are omitted variables that are statistically correlated with the predictor of interest while having an independent casual effect on the output variable of interest. Once this is done we can calculate how likely it is that our estimate of the causal effect would be so large if our regression model were accurate but there was no true causal effect. Once we determine that that would be really really unlikely, we can publish a paper and we will have a contributed a scientific fact to the world -- 1 dollar of additional parental income during childhood leads to beta dollars of additional income as an adult, plus or minus two times the standard error -- and if someone else comes along and publishes their own paper showing something quite different, then it must be the case that either they're idiots or we're idiots, right? No, I am saying, because there probably isn't a single "real" effect of parental income on earnings, there probably is a wide range of effects that depend on more variables than have ever been included in a social science model, and it is entirely possible that someone working in a slightly different geograhical region in a slightly different time period with a slightly different sample of individuals will come up with a meaningfully different estimate, and so I conclude that the value of social science comes less from academics hitting p-value targets in their papers and more from constructing a wide range of approaches to questions that hopefully converge on similar results (but are not necessarily guaranteed to) and thus it is too bad that the popular press likes to say things like "science has shown X" when X is established on the basis of a single paper with an impressive p value.
posted by leopard at 4:28 PM on January 24, 2014


Perhaps ascertaining causality in these medical outcomes is susceptible to this lack of truly nomothetic models for reasons idiosyncratic to drug therapy and non-infectious pathology. After all, it would be surprising to claim that we have no idea what effect alcohol consumption has on blood alcohol levels, to arbitrarily choose (from many) one example of a biological outcome or state and a treatment. Do you have a meta-analysis like Madigan's of attempts to explain or predict poverty? Or anything else?
posted by clockzero at 8:42 PM on January 24, 2014


The original study has gotten an obscene amount of press and attention but itself makes a bunch of severe statistical errors and fundamentally fails to support its conclusions. You can follow the whole drama in the literature here,
Why Most Published Research Findings Are False
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

Why Most Published Research Findings Are False: Problems in the Analysis
The article published in PLoS Medicine by Ioannidis makes the dramatic claim in the title that “most published research claims are false,” and has received extensive attention as a result. The article does provide a useful reminder that the probability of hypotheses depends on much more than just the p-value, a point that has been made in the medical literature for at least four decades, and in the statistical literature for decades previous. This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies.Unfortunately, while we agree that there are more false claims than many would suspect—based both on poor study design, misinterpretation of p-values, and perhaps analytic manipulation—the mathematical argument in the PLoS Medicine paper underlying the “proof” of the title's claim has a degree of circularity. As we show in detail in a separately published paper, Dr. Ioannidis utilizes a mathematical model that severely diminishes the evidential value of studies—even meta-analyses—such that none can produce more than modest evidence against the null hypothesis, and most are far weaker. This is why, in the offered “proof,” the only study types that achieve a posterior probability of 50% or more (large RCTs [randomized controlled trials] and meta-analysis of RCTs) are those to which a prior probability of 50% or more are assigned. So the model employed cannot be considered a proof that most published claims are untrue, but is rather a claim that no study or combination of studies can ever provide convincing evidence.

Why Most Published Research Findings Are False: Author's Reply to Goodman and Greenland
I thank Goodman and Greenland for their interesting comments on my article. Our methods and results are practically identical. However, some of my arguments are misrepresented:
Here is that separate paper,
ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE"
A recent article in this journal (Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124) argued that more than half of published research findings in the medical literature are false. In this commentary, we examine the structure of that argument, and show that it has three basic components:
1) An assumption that the prior probability of most hypotheses explored in medical research is below 50%.
2) Dichotomization of P-values at the 0.05 level and introduction of a “bias” factor (produced by significance-seeking), the combination of which severely weakens the evidence provided by every design.
3) Use of Bayes theorem to show that, in the face of weak evidence, hypotheses with low prior probabilities cannot have posterior probabilities over 50%.
Thus, the claim is based on a priori assumptions that most tested hypotheses are likely to be false, and then the inferential model used makes it impossible for evidence from any study to overcome this handicap. We focus largely on step (2), explaining how the combination of dichotomization and “bias” dilutes experimental evidence, and showing how this dilution leads inevitably to the stated conclusion. We also demonstrate a fallacy in another important component of the argument –that papers in “hot” fields are more likely to produce false findings.
We agree with the paper’s conclusions and recommendations that many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims. But calculating the unreliability of the medical research literature, in whole or in part, requires more empirical evidence and different inferential models than were used. The claim that “most research findings are false for most research designs and for most fields” must be considered as yet unproven.
The arguments made do take some amount of statistical understanding to interpret, but the smack down performed was hard enough that, as you can notice perusing Google Scholar, Ioannidis has not continued to publish in this area.
posted by Blasdelb at 3:57 AM on January 25, 2014 [4 favorites]


Do you have a meta-analysis like Madigan's of attempts to explain or predict poverty? Or anything else?

Do you really think that the effect of childhood poverty on lifetime earnings is like the effect of alcohol on blood alcohol levels? And you're a social scientist? Yeah, no wonder you've had a hard time understanding what I'm writing in this thread.
posted by leopard at 6:57 AM on January 25, 2014


By the way, do you have a link to a paper on Newton's Three Laws of Poverty? No I don't have a particular studies handy, and neither do you. Of course even if I did have a meta-analysis proving my point in my one particular area, you could just say that my point doesn't generalize to other areas, exactly like you did with my link to the OMOP research. That's why I've tried to make a general logical argument, which you've mainly tried to rebut by obnoxiously wondering if I've taken a basic statistics class. As a social scientist you may want to think about the burden of proof that you think I need to face versus the burden of proof that you think you need to face.

Baseldb, the Ioannidis claim that most findings are false is dubious, and we don't know the precise percentage, although as the rebuttal acknowledges "many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims." It is interesting how much circulation the "most findings are false" claim got, though.
posted by leopard at 7:09 AM on January 25, 2014


I'm not sure why everyone is jumping down leopard's throat. Most of what (s)he's saying could be taken from any good research design text. In particular, (s)he seems to be noting only that some social science takes the form of looking for the Big Obvious Relationship implied by the theory, with some controls, when better empirical assessment would examine multiple relationships derived from the theory.

Social science does deal in fuzzy, difficult concepts, abstractions that don't necessarily have any physical reality that can be measured. It deals in noisy data generated by extremely complex interconnected processes with (often) highly contingent or otherwise varying effects. That's what makes it hard.

About the only thing I'd take strong issue with is

Maybe this gives researches the right incentives to arrive at the truth, but if the truth is that effects are always situation-dependent and can vary considerably between individuals and times and places, I don't see who has the incentive to "discover" that.

Grad students and young assistant professors do. Anyone who doesn't like or disagrees with the original piece does. Especially when their finding attacks a core theoretical underpinning of the original study. You said more X → more Y, but there are lots of reasons why more X → more Y. Your theory is only right if not only more X → more Y, but more A → more B. But it doesn't, so get tae fuck.
posted by ROU_Xenophobe at 9:14 AM on January 25, 2014


Do you really think that the effect of childhood poverty on lifetime earnings is like the effect of alcohol on blood alcohol levels? And you're a social scientist? Yeah, no wonder you've had a hard time understanding what I'm writing in this thread.

No, I don't think that, nor did I say it. My example was intended to show that there are plenty of elucidated relationships in biomedical science which are presumably unaffected by the doubt introduced by Madigan's work (which, again, looks very interesting). If you care to re-read my comments, you'll see that I was merely drawing attention to the limits of Madigan's analysis: it deals with a single empirical topic, namely drug administration and discrete negative outcomes, by design. The doubt that it casts on the epidemiological literature it analyses doesn't necessarily tell us anything about other empirical topics in the same branch of inquiry, nor does it necessarily extend to other fields in the social sciences. Please note that I said "necessarily" in both cases. If you have some compelling reason to think its insights apply to fields of study which weren't included in the analysis, please go ahead and explain it.

You seem to have a radically skeptical view of social science, and while I'm not about to attempt changing your mind, I do think it's important to emphasize that the work you've cited simply doesn't support your skepticism about, for example, analyses of the etiology of poverty.

By the way, do you have a link to a paper on Newton's Three Laws of Poverty? No I don't have a particular studies handy, and neither do you.

If you can't cite even one actual study or analysis that you can take issue with, at the conceptual or methodological or any other level, I'm not inclined to take what you're saying very seriously.

Of course even if I did have a meta-analysis proving my point in my one particular area, you could just say that my point doesn't generalize to other areas, exactly like you did with my link to the OMOP research.

If you can actually prove your point, by all means please do so. If you don't understand why generalizability has to be assessed before we make big inferences, then I think you should reconsider what you're saying.

That's why I've tried to make a general logical argument, which you've mainly tried to rebut by obnoxiously wondering if I've taken a basic statistics class. As a social scientist you may want to think about the burden of proof that you think I need to face versus the burden of proof that you think you need to face.

I'm not sure what your general logical argument is, to be honest, and I didn't mean to offend you by asking if you knew what you were talking about. I really couldn't tell. The OMOP research you cited doesn't necessarily have anything to tell us about work that it doesn't analyze, though if you have some compelling argument for its wide applicability I'd be happy to hear it. As far as I can tell, you're making a sort of common-sense argument using terms like "fuzzy" and "noisy" to categorically deny the validity of social science, and since I've already pointed out that those terms lack meaning as deployed I'm not sure what else to say. Again, go ahead and be a radical skeptic about social science to your heart's content, but understand that saying "Oh gosh it's all so complicated" isn't really an argument, nor can it prove or disprove anything.

I'm not sure why everyone is jumping down leopard's throat. Most of what (s)he's saying could be taken from any good research design text. In particular, (s)he seems to be noting only that some social science takes the form of looking for the Big Obvious Relationship implied by the theory, with some controls, when better empirical assessment would examine multiple relationships derived from the theory.

That's a completely unfounded and tendentious characterization, though, and the idea that professional researchers lack the insight that would lead a non-scholar to aver that maybe examining multiple relationships is a good idea just baffles me utterly. I assure you, social science is aware of the possibility. Likewise, merely suggesting that "some" social science maybe isn't all that great or rigorous is a pretty weak claim.

Social science does deal in fuzzy, difficult concepts, abstractions that don't necessarily have any physical reality that can be measured. It deals in noisy data generated by extremely complex interconnected processes with (often) highly contingent or otherwise varying effects. That's what makes it hard.

I mean no personal attack by saying this, but it seems to me that you're speaking from ignorance rather than knowledge. This remark which I quoted is so vague, so over-general and so fundamentally mistaken that it's not worth refuting in detail. Is there some specific study, or body of work, or anything substantial that you can point to which evinces any of these characteristics? Because you're absolutely not describing social science as a whole, here.
posted by clockzero at 11:34 AM on January 25, 2014


clockzero, fine, I have no idea what you are talking about. You haven't cited a single specific example of solid research in your comments, you're just speaking in very vague broad generalities that have no meaning. The only thing that is clearly coming through is that you claim that you are a social scientist and your default assumption is that other people commenting on this topic are ignorant and uneducated and should not be taken seriously unless they adhere to a standard of detailed proof that you haven't come remotely close to approaching in your own comments.
posted by leopard at 11:48 AM on January 25, 2014


You seem to have a radically skeptical view of social science,

...

the idea that professional researchers lack the insight that would lead a non-scholar to aver that maybe examining multiple relationships is a good idea just baffles me utterly. I assure you, social science is aware of the possibility.

This is a perfect example of what I'm talking about. On one hand what I'm saying is radical and loony and ridiculous, and the other hand it's also really basic common sense that everyone in science already knows. Wow, I really can't win, can I. Nowhere did I say that the social sciences are broken or fatally flawed, nowhere did I say that social scientists are idiots who don't understand these things, this is just your own uncharitable reading of what I've written.

I'm not going to flaunt my educational credentials here, but I strongly doubt that you're actually better qualified than I am to comment on this topic. Nothing you've written here suggests that.
posted by leopard at 12:03 PM on January 25, 2014


I feel that this is really becoming a tete-a-tete derail, so I will just reply once more and then maybe we can discuss it over private messages if you want. I'm not attacking you personally, leopard, I'm merely challenging the facticity of what you're saying. I know that I wondered in writing earlier if you had training in statistics and by extension principles of sampling and inference, but that was because I wasn't sure and it seems like a relevant qualification for understanding what we're talking about. That wasn't an attack on you, personally.

Yes, you're right that I haven't cited any research. I don't think I'm making any empirical claims that require citation, so I didn't cite anything. That's not because I'm advancing a different standard for myself than for others. It's because I'm saying a different kind of thing than (for example) you are. And if you think my standards are excessively high as regards what should be taken seriously, I don't know what to say except that it's a little naive to expect that you won't be challenged when making ambitious critiques of entire fields of study which don't overtly evince real familiarity with the the work in those fields.

This is a perfect example of what I'm talking about. On one hand what I'm saying is radical and loony and ridiculous, and the other hand it's also really basic common sense that everyone in science already knows. Wow, I really can't win, can I. Nowhere did I say that the social sciences are broken or fatally flawed, nowhere did I say that social scientists are idiots who don't understand these things, this is just your own uncharitable reading of what I've written.

What's common-sense here is the idea that "some social science takes the form of looking for the Big Obvious Relationship implied by the theory, with some controls, when better empirical assessment would examine multiple relationships derived from the theory," as ROU_Xenophobe said. The problem is that the first "form" described there is sort of a straw man. Let's make it simple: say you want to know whether or not there's a relationship between (for example) education and income. Your null hypothesis is that no relationship exists, and your task is to see if you can marshal some data which would enable you to reject that null hypothesis. If you can do that, then you can explore the characteristics of that relationship and eventually build a regression model that includes other relevant variables too. When you build a multivariate model, as quantitative social scientists do, you necessarily examine multiple relationships. There are lots of statistical methods you can use to help diagnose the sufficiency of your model to explain the data you have, and based on the results of using those methods, you might find that you omitted a significant variable, or that there's a confounding variable which complicates the relationship between your DV and one or more IVs, etc. The point here is that social science done in accordance with our best practices already takes possibilities like "multiple relationships" into account, and the idea that social scientists in general might not realize that the social world is complex like that is silly.

What's radical, on the other hand, is this sort of thing:

...there probably isn't a single "real" effect of parental income on earnings, there probably is a wide range of effects that depend on more variables than have ever been included in a social science model, and it is entirely possible that someone working in a slightly different geograhical region in a slightly different time period with a slightly different sample of individuals will come up with a meaningfully different estimate...

This is the kind of remark that made me ask if you have a background in statistics. I'm not a statistician, but it strikes me as strange. We use statistics precisely because alternatives like voicing intuitive guesses about what seems likely are not very useful in producing knowledge. When a researcher creates a model of the relationship between variables like parental income and earnings, it is understood that analysis is being done on available data and can only be nomothetic to the extent that the sample data is representative of population data. We know that we cannot observe the coefficient in the population regression equation, so we estimate based on what we do know. Bayesian and frequentist statistics have different assumptions about the relationship between what we have observed and what is true, but in both cases we make some kind of inference. If your point is that statistical inference itself is an invalid route to knowledge, I wouldn't even know where to begin.

In any case, saying that some hypothetical observed effect of parental income on earnings is "not real" because sampling isn't as good as knowing everything or because models can be misspecified in ways that are impossible to discern or correct seems to evince a peculiar understanding: sure, there might be omitted variables, but because we aim toward parsimony in science we don't assume they exist without evidence. The a priori privileging of common-sensical principles like "it's probably more complex than that" over statistical analysis of data is radical and not especially defensible, to my mind. If you have access to several analyses which use the same regression model with different but formally identical data sets and show radically different relationships between parental income and future earnings, please do share.
posted by clockzero at 2:03 PM on January 25, 2014


I mean no personal attack by saying this, but it seems to me that you're speaking from ignorance rather than knowledge.

I'm a poli-sci prof and have been for 14 years.

This remark which I quoted is so vague, so over-general and so fundamentally mistaken that it's not worth refuting in detail. Is there some specific study, or body of work, or anything substantial that you can point to which evinces any of these characteristics? Because you're absolutely not describing social science as a whole, here.

You can't possibly be serious. You think that 'preferences' aren't a difficult, fuzzy, and abstract concept to try to measure? Or democracy? Or class?

As to the second, well, let's take voting as a simple example. The data-generating process for a stream of votes will include some vast array of variables, some of which are themselves caused by a whole ream of other factors (ie partisanship) which are in turn generated by yet other processes (social construction of class). And many of the effects are contingent on something else. The effect of income is not the same across races. The effect of religiosity is not the same across races. The effect of religiosity is not the same across religions. The effect of religiosity is not even the same across Christian denomination. Some variables have effects that differ pretty strongly across time.

And the data-generating process that creates survey responses about voting, which is mostly what we actually have, are even more complex.

the idea that professional researchers lack the insight that would lead a non-scholar to aver that maybe examining multiple relationships is a good idea just baffles me utterly

You're not reviewing the same stuff I am.
posted by ROU_Xenophobe at 2:09 PM on January 25, 2014 [4 favorites]


The problem is that the first "form" described there is sort of a straw man

This is simply untrue. I have myself published more than one paper looking at an existing research stream that remained focused on the Big Obvious Predictions of the existing theories, deriving some additional observable implications from them, and testing them in a new setting (because I happened to have data there).

When you build a multivariate model, as quantitative social scientists do, you necessarily examine multiple relationships.

But your example isn't a multivariate model, it's just a multiple regression with one outcome variable. And it doesn't address what I was talking about, even though I specified it pretty clearly.

Let's take your example. You want to know whether education affects income, so presumably you have some theory that makes that assertion. More education → more income is the Big Obvious Relationship. No matter how many controls you add, you're still examining the prediction that more education → more income. No matter how complex a functional form you specify, you're still examining the same prediction that more education → more income. Even if you luck into a situation where you have a nicely identified model or a regression discontinuity or similar, you're still only examining the prediction that more education → more income.

But any reasonable theory is also going to generate other predictions that aren't directly about education and income. Without having the theory in hand, I can't say what these other predictions might be. Really excellent research looks at those other predictions too, but this is not the norm.
posted by ROU_Xenophobe at 2:39 PM on January 25, 2014


I mean no personal attack by saying this, but it seems to me that you're speaking from ignorance rather than knowledge

I'm a poli-sci prof and have been for 14 years.


Well, one might have gotten the impression from what you wrote that the social sciences are collectively and inherently lacking in precision and clarity. I think we can be open about the limits of our work without casting aspersions on the endeavor of social science itself, is my point here. I meant no personal offense and I see I was wrong about your expertise.

You can't possibly be serious. You think that 'preferences' aren't a difficult, fuzzy, and abstract concept to try to measure? Or democracy? Or class?

Those are indeed abstract and more difficult to measure than some largely-uncontested quality like height or weight or speed. But not all social science deals with such concepts in those ways: counting the number of people who use SNAP benefits could be part of an exercise in social science that entails little abstraction or fuzziness, for instance. The point I was making is that not all social science is fuzzy, abstract, and entails fraught measurement. I think that's a potentially damaging generalization.

As to the second, well, let's take voting as a simple example.

Alright.

The data-generating process for a stream of votes will include some vast array of variables, some of which are themselves caused by a whole ream of other factors (ie partisanship) which are in turn generated by yet other processes (social construction of class). And many of the effects are contingent on something else. The effect of income is not the same across races. The effect of religiosity is not the same across races. The effect of religiosity is not the same across religions. The effect of religiosity is not even the same across Christian denomination. Some variables have effects that differ pretty strongly across time. And the data-generating process that creates survey responses about voting, which is mostly what we actually have, are even more complex.

Right, but the point is that there is a quantifiable effect of income on voting for different races, or that age intervenes upon the effect of income on life satisfaction or whatever. In any of those cases, there's still a relationship which can be characterized, quantified, and which makes theoretical sense. The contention I was responding to seemed to be saying that we can't know what effect religiosity might have on voting, let alone how that effect is modulated by different religious affiliations, because we can't actually know when our models are misspecified nor in what respect they are, and because there's some big unaddressed problem with making inferences from samples. Those are the claims that don't make sense to me.

the idea that professional researchers lack the insight that would lead a non-scholar to aver that maybe examining multiple relationships is a good idea just baffles me utterly

You're not reviewing the same stuff I am.


Listen, we both know that there are people who do shoddy work, and some of them are scholars and professors, unfortunately. I was saying that social science, as an over-arching endeavor, is not unaware of the fact that the social world is complex, and I stand by that. We shouldn't judge social science itself by the actions of its worst practitioners, though the prevalence and characteristics of bad scholarship should obviously inform how we think about how to do research.

The problem is that the first "form" described there is sort of a straw man

This is simply untrue. I have myself published more than one paper looking at an existing research stream that remained focused on the Big Obvious Predictions of the existing theories, deriving some additional observable implications from them, and testing them in a new setting (because I happened to have data there).


It's a straw man only in the sense that it doesn't characterize all social science research, which you yourself noted by saying "some". And I was in the process of saying that your assertion was sensible to the point of being idiomatic, anyway. My point there was just that not all social science operates that way or intends to accomplish that.

When you build a multivariate model, as quantitative social scientists do, you necessarily examine multiple relationships.

But your example isn't a multivariate model, it's just a multiple regression with one outcome variable. And it doesn't address what I was talking about, even though I specified it pretty clearly.


Yes, that wasn't a multivariate model. I wasn't responding to you directly at that point, but what you said about the prevalence of looking for Big Obvious Relationships at the expense of examining other consequences of a theory seems empirically correct to me. I didn't see that observation in what the other person said, but maybe I just missed something you picked up on. I agree that research which tests a theory comprehensively rather than on one point is better, and I'm glad you mentioned that here.
posted by clockzero at 3:45 PM on January 25, 2014


The a priori privileging of common-sensical principles like "it's probably more complex than that" over statistical analysis of data is radical and not especially defensible, to my mind.

I guess they don't teach reading comprehension in sociology grad school.
posted by leopard at 4:36 PM on January 25, 2014


You realize, I imagine, that you quoted me without noting what you apparently think I misunderstood?
posted by clockzero at 5:50 PM on January 25, 2014


You are mischaracterizing my position. Try re-reading what I wrote. If it helps you concentrate, try to pretend that my comments were written by a math whiz with an Ivy degree and a very stats-heavy job.
posted by leopard at 6:31 PM on January 25, 2014


« Older I caught a crook . . .   |   (Re)building Worlds and reverse engineering a... Newer »


This thread has been archived and is closed to new comments