How many studies are faked or flawed?
September 10, 2023 5:56 PM   Subscribe

 
Because i don’t have a subscription to Nature, i can’t determine if this article is real or faked.
posted by armoir from antproof case at 6:05 PM on September 10, 2023 [13 favorites]


I'm still haunted by the observation that it isn't that psychology has a "replication crisis," it's that psychology is the first field to openly acknowledge and grapple with a crisis that affects every field of science
posted by DoctorFedora at 6:22 PM on September 10, 2023 [30 favorites]


archive.org link to article.
posted by clavdivs at 6:27 PM on September 10, 2023 [7 favorites]


Perhaps my favorite example of no one willing address the sheer scale of flawed research comes out of a 2014 experiment showing that human males (and only males) stress the hell out of rats and mice.

I’m sure the last 70 years of behaviorist studies noted the gender of the lab assistants, right?
posted by Tell Me No Lies at 6:32 PM on September 10, 2023 [23 favorites]


I'm still haunted by the observation that it isn't that psychology has a "replication crisis," it's that psychology is the first field to openly acknowledge and grapple with a crisis that affects every field of science

I wouldn't say every field per se, but every field that relies on statistical inference, maybe. I started out in psychology and moved to statistics in part because this crisis bothered me. Like a noob I assumed statistics would have some kind of answer, because it's not like RA Fisher belonged to psychology, but we don't -- and yet for some reason nobody is lobbing tomatoes at us. Why not?! Is it that we're too boring to look at for long enough to aim?
posted by eirias at 6:58 PM on September 10, 2023 [9 favorites]


Dunno, CS and stats have their problems too. Every paper introducing some estimator or algorithm that claims is better than all that came before is probably p hacking their result to show theirs is the best. I keep thinking about that gzip paper that beats nearly all embedding models.
posted by MisantropicPainforest at 7:03 PM on September 10, 2023 [5 favorites]


Ecology has 'the Ecological Detective" monogram, but p values built for physics are still published everywhere. Sigh.
posted by eustatic at 7:34 PM on September 10, 2023


It s almost like every field needs a cadre of mathematical statisticians immersed in the priors and theories of that field, but they all became quants on wall street instead
posted by eustatic at 7:38 PM on September 10, 2023 [16 favorites]


I work with biostatisticians, who do their best to make sure that trials are conducted correctly. A couple of people in the department got involved in the Potti scandal at Duke. They were not happy with the fact that no one in the medical establishment seemed to care about his malfeasance.
posted by Spike Glee at 7:53 PM on September 10, 2023 [3 favorites]


RA Fisher ... we don't -- and yet for some reason nobody is lobbing tomatoes at us

Well, computer scientist Judea Pearl definitely threw some tomatoes at Fisher. If you take Pearl's book at face value, Fisher personally set back Bayesian statistics and observational science by roughly his lifetime. Maybe he gets a pass because he also progressed it forward by an equal amount?

At larger scopes, most computer science is so rudimentary it doesn't bother with p values. Or so advanced it doesn't need it. You won't find p-values in the paxos papers, but they're still landmark in their own way.

thinking about that gzip paper that beats nearly all embedding models.

You might wish to clarify whether you mean that the paper demonstrates all the other embedding model papers were bad, or whether the gzip paper was also bad.
posted by pwnguin at 8:47 PM on September 10, 2023 [2 favorites]


There appears to be be some skepticism about the methods used in this study.

An Appraisal of the Carlisle-Stouffer-Fisher Method for Assessing Study Data Integrity and Fraud


In this Statistical Grand Rounds article, we explain Carlisle's methods, highlight perceived limitations of the proposed approach, and offer recommendations. Our main findings are
(1) independence was assumed between variables in a study, which is often false and would lead to "false positive" findings;
(2) an "unusual" result from a trial cannot easily be concluded to represent fraud;
(3) utilized cutoff values for determining extreme P values were arbitrary;
(4) trials were analyzed as if simple randomization was used, introducing bias;
(5) not all P values can be accurately generated from summary statistics in a Table 1, sometimes giving incorrect conclusions;
(6) small numbers of P values to assess outlier status within studies is not reliable;
(7) utilized method to assess deviations from expected distributions may stack the deck;
(8) P values across trials assumed to be independent;
(9) P value variability not accounted for; and
(10) more detailed methods needed to understand exactly what was done.

It is not yet known to what extent these concerns affect the accuracy of Carlisle's results. We recommend that Carlisle's methods be improved before widespread use (applying them to every manuscript submitted for publication).

posted by Umami Dearest at 10:26 PM on September 10, 2023 [5 favorites]


If you think *this* is bad, consider the proportion of the pre-clinical studies used to justify human clinical studies that must be just as faked or flawed. The world of academic and curiosity research that supports drug discovery is an absolute machine for pumping out studies that are so bad as to be meaningless, non-reproducible, heaping piles of rubbish. And yet, they are the bread and butter of many (maybe most?) non-clinical biomed careers. Entire companies exist to do nothing but charge enormous sums to run potential drug candidates through a wide repertoire of experiments that are accepted as necessary (often because they are required for regulatory purposes, eventually) but don't deliver medicines that reliably succeed in clinical trials. This whole scenario is my life's work and I'm happy to see attention drawn to it. I would be thrilled to burn down the ramshackle, built-by-accretion system we have and start fresh with an intentionally-designed approach.
posted by late afternoon dreaming hotel at 1:23 AM on September 11, 2023 [12 favorites]


From my perspective inside the system, none of this will change as long as number of publications remains a key metric for evaluating academic job performance. Especially in medicine, where the metric is actually grant dollars, an imperfect proxy for publication, and the environment is chillingly cut-throat.

Even in mathematics, where reproducing a result isn't so expensive (you just double check the authors' work, essentially) it's much better for your job prospects to churn out paper after paper of small variations on the same idea than it is to experiment and branch out and especially to focus on teaching.

Actions follow incentives. There's a clear crisis of actions, but so far the discussion among academic leadership seems to willfully ignore how the present incentive structure made it inevitable.
posted by dbx at 4:01 AM on September 11, 2023 [14 favorites]


eustatic: but p values built for physics are still published everywhere. Sigh.

I'm pretty sure p values were created for biology and psychology. If you want to associate them with something bad, the correct link is to eugenics via p value creator Karl Pearson.
posted by clawsoon at 4:05 AM on September 11, 2023 [3 favorites]


Umami Dearest: There appears to be be some skepticism about the methods used in this study.

heh

heh heh
posted by clawsoon at 4:09 AM on September 11, 2023


Weren’t p-values Egon, rather than Karl, Pearson? Along with Fisher. I think Fisher was some flavor of eugenicist too, don’t worry. That and not the replication crisis is the reason Fisher’s name was stripped from an ASA award a few years ago, I believe.
posted by eirias at 4:35 AM on September 11, 2023


There appears to be be some skepticism about the methods used in this study.

Interesting! The article does cover more than Carlisle's tool though.
posted by latkes at 5:49 AM on September 11, 2023


It s almost like every field needs a cadre of mathematical statisticians immersed in the priors and theories of that field, but they all became quants on wall street instead

In general these don't seem to be problems where statisticians would be of any particular utility.

You don't need a statistician to deal with basic spreadsheet fuckups like referring back to the wrong cell, performing the wrong calculation, misusing the result of a previous calculation, or making incorrect data modifications (ie flipping something back from positive to negative because you forgot that you previously flipped it from negative to positive).

Similarlly, a statistician isn't going to be much use when you have 30 observations, because that's how much money you had, and it turns out later that you happened to draw an unlucky sample.

Statisticians are great if you're doing multinomial logit over your outcome variable but it turns out you really do need to worry about the independence from irrelevant alternatives assumption and should be using multinomial probit instead or need some related novel estimator, which turns out to make a small but noticeable difference in the standard errors on your key IVs. But the replicability problems in general seem to be much more basic than the sorts of subtle problems and assumptions-violations statisticians should be trained to find and deal with.

The core problem seems to me to be a combination of publish-or-perish, disciplines that valorize the creation of novel datasets without the funding to make those datasets large (or with ethical concerns that militate against it), and a reluctance to publish null results . Of course in an environment like that you're going to see lots of false positives get published even before you face the problems of underlying data errors. And because really deep peer review you'd need to catch spreadsheet fuckups just isn't worth the reviewers' time, data errors are going to be commonplace.
posted by GCU Sweet and Full of Grace at 6:37 AM on September 11, 2023 [3 favorites]


Is there a profession of statistics copy editor?
posted by Nancy Lebovitz at 6:43 AM on September 11, 2023 [1 favorite]


When I was listening to the audio version of this I was thinking journals could both require the full data set and then have a randomized spot check process for reviewing the data. Better than current system of never checking.
posted by latkes at 7:11 AM on September 11, 2023 [2 favorites]


It shouldn’t take graduate training to avoid these errors… but it takes the mindset of a statistician to notice and care, I think. There’s a certain degree of “could this be bullshit and how would I know” that has to become reflexive. Like how most people voting on a budget don’t want to have to read and understand it? Most people signing off on a science paper treat the methods and results sections the same way. People are terrified of tables and figures. 95% of my job really is basic numeracy.

I had a job once where everything that went out the door was fully dual-programmed and all errors were reconciled. It was at a CRO, so working in a regulated space, and the practice made sense there. The job was not the right one for me in other ways, but I salivate at the memory.
posted by eirias at 7:23 AM on September 11, 2023 [1 favorite]


When I was listening to the audio version of this I was thinking journals could both require the full data set and then have a randomized spot check process for reviewing the data.

The problem is that somebody has to pay people to do that spot-checking, and it's not going to be cheap.
posted by GCU Sweet and Full of Grace at 7:52 AM on September 11, 2023 [1 favorite]


Is there a profession of statistics copy editor?

Andy Gelman.
posted by MisantropicPainforest at 8:39 AM on September 11, 2023 [3 favorites]


Similarlly, a statistician isn't going to be much use when you have 30 observations, because that's how much money you had

I've gotten the impression that that's where the statistician is *most* useful, to tell you that with only 30 observations your whole study is going to be pointless noise and you shouldn't do it until you get more money for more observations.
posted by clawsoon at 12:44 PM on September 11, 2023 [1 favorite]


The problems are deeper than just poor use of stats. There is a fundamental lack of adequate control in many medical studies, and sometimes an active avoidance of it, especially in psych studies.

Without adequate control it is impossible to draw safe conclusions about causation (including direction of any causation).

And figuring out causal relationships is kind of the whole point of all science.
posted by Pouteria at 9:05 PM on September 11, 2023


We haven't even touched on the financial incentives and the pharmaceutical industry.
posted by latkes at 9:26 PM on September 11, 2023 [1 favorite]


Nor the non-pharmaceutical industry. Plenty of serious problems there.
posted by Pouteria at 10:18 PM on September 11, 2023


re: the "30 observations" case: I sit on a university ethics review committee where part of the requirement for review is a description of the statistical analysis plan and justification for the number of subjects run. My role is not scientific review so I leave that part of the review to the scientific reviewers because frankly I do not have the stats chops to know whether it's bullshit. But I have seen many cases where part of the review discussion by the scientific reviewers is "this analysis plan makes no sense, they're not going to have the statistical power to get useful results with this number of subjects," and then it becomes an ethics question because you're wasting participants' time and effort on a study that will never be able to do anything useful. Those can and do get sent back for the researcher to justify what they're doing and why it actually is going to be a useful contribution to science, not a waste of participants' time. (Or, in the case of medical studies, actively posing some sort of risk or discomfort to the participant for no good reason.) And if the researcher can't justify their choices in a way that convinces the committee that there's a snowball's chance in hell of something scientifically valid and useful coming out of the statistical plan, that study does not get approved.

But that's not necessarily how every IRB works, and some research will fall outside such commitees altogether, and you need the right expertise in the room to look at the justification and say "actually that's nonsense." And your analysis plan when you write the protocol isn't necessarily what ends up happening when you've got a real live data set to explore. But for whatever a glimpse behind the curtain is worth: this all could be even worse if those 12-person studies that got axed by the IRB committee were actually getting run.
posted by Stacey at 4:57 AM on September 12, 2023 [2 favorites]


Relatedly, phenylephrine is about to be pulled from shelves, a drug that 1) was approved in 1976 and again in 1994 based purely on in-house, pharma-backed studies (see section on Efficacy) and 2) because the makers of Sudafed didn't want to lose revenue due to pseudoephedrine being pushed behind the counter thus substituting it with phenylephrine but keeping the brand-name 'Sudafed' inspite of it not having any pseud(o)ephed(rine).

The greed and degeneracy of pharma companies is the true basis for a lot of the bullshit anti-vax/medicine snake oil sales pitch. From one barely regulated con to a totally unregulated one we went.
posted by paimapi at 11:03 AM on September 12, 2023 [1 favorite]


If you guys think bringing drugs to market relies on sketchy study data, wait til you hear about the FDA 510(k) clearance process for up to 98% of implanted medical devices going back to 1976.
posted by Unicorn on the cob at 8:11 AM on September 13, 2023 [2 favorites]


And figuring out causal relationships is kind of the whole point of all science.

Couldn't disagree more. Its a part for sure, but far from the whole point.

I've gotten the impression that that's where the statistician is *most* useful, to tell you that with only 30 observations your whole study is going to be pointless noise and you shouldn't do it until you get more money for more observations.

You don't need a statistician for this. All you need is adequate statistical training and discipline or industry wide standards.
posted by MisantropicPainforest at 8:31 AM on September 13, 2023


« Older The journey of your life   |   Boats the way the Romans did them Newer »


This thread has been archived and is closed to new comments