Join 3,572 readers in helping fund MetaFilter (Hide)


Statistics Done Wrong: a free guide for scientists
October 27, 2013 4:37 AM   Subscribe

Statistics Done Wrong is a guide to the most popular statistical errors and slip-ups committed by scientists every day, in the lab and in peer-reviewed journals. Statistics Done Wrong assumes no prior knowledge of statistics, so you can read it before your first statistics course or after thirty years of scientific practice.
posted by Foci for Analysis (39 comments total) 178 users marked this as a favorite

 
Sort of eponysterical, no? Also this is a project well worth doing, and while I do not use statistics much in my profession, I plan to spend some time with this. Kudos to the authors!
posted by sfts2 at 4:56 AM on October 27, 2013 [1 favorite]


"When differences in significance aren’t significant differences" — this fallacy has previously gotten its own FPP.
posted by John Cohen at 5:23 AM on October 27, 2013


Really outstandingly done.
posted by escabeche at 5:58 AM on October 27, 2013


Hey this is awesome!

I'll never forget that sinking feeling I got reading so many papers after taking advanced graduate level statistics
posted by Blasdelb at 6:12 AM on October 27, 2013 [2 favorites]


Great (and totally funny). I have to confess that in my first ever paper, way back as undergraduate, way before any kind of e-publishing, of which I was 5th or 6th coauthor, I might have committed four or five major statistical sins.
posted by francesca too at 6:29 AM on October 27, 2013


The Nine Circles of Scientific Hell to which the truly sinful are damned.
posted by Blasdelb at 6:43 AM on October 27, 2013 [9 favorites]


This was really interesting, and apparently I am a giant nerd.
posted by kyrademon at 7:49 AM on October 27, 2013


If you administer questions like this one to statistics students and scientific methodology instructors, more than a third fail. If you ask doctors, two thirds fail. They erroneously conclude that a \(p < 0.05\) result implies a 95% chance that the result is true – but as you can see in these examples, the likelihood of a positive result being true depends on what proportion of hypotheses tested are true. And we are very fortunate that only a small proportion of women have breast cancer at any given time.

Sadly, this sort of innumeracy is, if anything, more wide-spread in the general public, which includes administrators and policy makers of all sorts. Decisions are easy and necessary, but well-founded decisions are hard. And, if most of the advisers to those administrators and policy makers are probably only slightly better off, we see one of the roots of our regular policy disasters.

I think the basic course in college math should be statistics, not algebra or calculus, which are almost infinitely less useful to anyone outside of a STEM discipline (and to many within them). Similarly, statistics should be a much more central subject in high school and even grade school. On the other hand, actually instilling critical thinking in students is a lot less popular than talking vaguely about it, because no one likes being called on their hand-waving....
posted by GenjiandProust at 7:52 AM on October 27, 2013 [10 favorites]


Hell yes. As a grad student who is currently halfway through his first serious statistics course, this is totally relevant to my interests and full of delicious schadenfreude since I, of course, will never ever commit any of these errors.
posted by Scientist at 7:58 AM on October 27, 2013 [4 favorites]


Hey, now I have a name - 'pseudoreplication' - for the shit that drives me crazy every single time I read a math education study! Prof/teacher says, 'hey, I have an idea, let's try it in my class(es) and call it a study.' Gets a few hundred data points by doing the idea in some classes, not realizing the great big dependent factor standing at the front of the room...
posted by kaibutsu at 7:58 AM on October 27, 2013 [5 favorites]


I think the basic course in college math should be statistics, not algebra or calculus

I tend to agree, but the thing to be careful about is this. We tend to make a mental comparison between the calculus classes we have (cookbook, unconceptual, disliked by most students) and the statistics courses we imagine (kind of like the experience of reading this webpage.) But if we actually made statistics as mandatory as calculus I think those statistics classes would look a lot more like calculus classes do now.
posted by escabeche at 8:13 AM on October 27, 2013 [3 favorites]


But if we actually made statistics as mandatory as calculus I think those statistics classes would look a lot more like calculus classes do now.

Sadly, the likelihood of this is very high. Partly due to perverse incentives (there is very little benefit to a faculty member in teaching intro courses, so the work tends to get pushed off on grad students and per-course lecturers) and partly due to a real lack of faculty who are both motivated to and good at teaching largish classes of new freshmen. Could this change? Sure, but it would mean a) faculty being clearly and constantly rewarded for this effort which means administrators b) would have to put resources toward non-prestige projects. Calculate the chances of that.
posted by GenjiandProust at 8:21 AM on October 27, 2013


What do we call the error of dismissing a study for ideological reasons by flippantly saying "correlation does not equal causation" while making no serious effort to read the study and explain why you think it contains a methodological error?
posted by obscure simpsons reference at 8:26 AM on October 27, 2013 [2 favorites]


The name for that error is "Internet Debate."
posted by GenjiandProust at 8:35 AM on October 27, 2013 [6 favorites]


> What do we call the error of dismissing a study for ideological reasons by flippantly saying "correlation does not equal causation" while making no serious effort to read the study and explain why you think it contains a methodological error?

Repeating a phrase as a criticism without understanding what it means. I'd propose "ad homonym syndrome". (A disorder closely related to the "Dumbing-Kruger effect".)
posted by nangar at 8:51 AM on October 27, 2013 [6 favorites]


Is that the new term for people who use the term "Dunning–Kruger effect" as a shorthand for "I think other people are stupid"? Because we really need a name for that phenomenon too.
posted by kiltedtaco at 8:54 AM on October 27, 2013 [5 favorites]


This is really good and I wish it had more to say about regression-oriented fields so I could make my wee bairns read it.
posted by ROU_Xenophobe at 9:03 AM on October 27, 2013


Actually, as someone who is in the middle of learning graduate-level statistics right now I would freakin' love it if there were a cookbook-style statistics resource out there. I'm thinking more of a reference than a class though, in the form of a recipe book or maybe a dichotomous key.

What I envision is something that would guide the researcher step-by-step through screening the data, choosing an appropriate analysis, testing the analysis's assumptions, making appropriate transformations and/or standardizations, analyzing the dataset, and interpreting the results. Something low on concepts (because there are already lots of great concept-based resources out there, which this reference would be intended to complement rather than replace of course) and high on concise decision-making and procedural instructions. Ideally it would integrate with a popular statistics package, like R.

Such a reference could never hope to be infallible of course and would have to allow a lot of leeway for modification based on real-world problems, so it would probably be difficult to write it in a way that was both clear enough to be usable and flexible enough to not be misleading. It would perhaps work best if paired with a companion reference that was more conceptual and designed to help users resolve ambiguity in their decision-making and interpretation, and in making appropriate modifications to the basic "recipes".

I really think that something like this, if done well (which I admit would be a challenge), would be a really valuable resource for researchers. I'm just getting my feet wet with statistical analysis, but I have a feeling that the anxiety around "am I absolutely sure that I have chosen the best analysis/satisfied the assumptions of my chosen technique/interpreted my results correctly?" doesn't ever really completely go away.

Having clear, well-designed procedures to follow would really help I think. All my bench work in the lab comes with such procedures, and it makes it ever so much easier to know that I'm on the right track. It's not a replacement for a conceptual background because one needs to be able to recognize when it's appropriate to deviate from or modify the procedure, but it's just incredibly helpful to have a step-by-step guide to follow. Making such a guide for statistical analysis would be more difficult than for, say, a tissue-based DNA extraction, but I don't think it would be impossible.

Perhaps someday I'll take a shot at it.
posted by Scientist at 9:04 AM on October 27, 2013 [7 favorites]


Oh, the above was a tangent inspired by escabeche's comment regarding the discouraging possibility that mandatory statistics courses for undergraduates would likely be boring and un-edifying, a sentiment with which I fully agree. I actually did have to take a mandatory statistics course as part of my undergraduate Bio degree, and it was pretty pointless. It was long on doing least sums of squares and ANOVA tables by hand, short on actually explaining how to use and interpret statistical analysis in the real world or why statistics is such a valuable and important branch of mathematics, one that any modern citizen in a developed society should have basic literacy in and that anyone who hopes to work in a data-driven field should have a firm command of.
posted by Scientist at 9:09 AM on October 27, 2013 [1 favorite]


100% agree with escabeche. In students' minds, the overwhelming conflation is that math is a discipline that, in a sense, is isomorphic to your basic arithmetic. Cookbook, are it were. Forget a calculus vs. stats debate, you could trot out the most fantastic curriculum that is 1/3 Vi Hart, 1/3 Mr. Wizard's world, and 1/3 Martin Gardner and I would hazard a guess that the lines of questioning from students will eventually boil down to: "so, how do I solve this?"

I'd argue that the majority of math teachers can't see beyond this cookbook-level view of math, save maybe a couple proofs of the Pythagorean Theorem or something like that. But for a math teacher wanting to incorporate big picture math ideas, non-procedural type of exercises into their class... it's a fool's errand. 28 out of 30 kids -- the same kids who give you teacher evaluations at the end of the class -- are looking at you cluelessly. You would never get an ounce of support from administration: to them 'math' is the same thing that makes their big data metrics tick, the sort of 'math' that gives an answer.

That mini-rant said, this fellow's writing is spot-on. On cursory inspection, it is a fun read. But... it's a fun read for me. For the, ahem, 'others' holding tight to the 'math = cookbook' view of things, it's a bit too text-overloady to win their hearts and minds, I fear. My proposal: re-write it beginning with stark descriptions of a hellish dystopia that we've fallen into, and then cut to an erroneous interpretation of a P-value or some such that renders the dystopia into, "oh, whoops, if I understood that correctly I'd have realized that the odds 3rd Q profits will drop isn't statistically significant."
posted by Theophrastus Johnson at 9:09 AM on October 27, 2013 [2 favorites]


Also I will note that at no point in my undergraduate career did anybody explain to me that p < 0.05 as a cutoff for statistical significance is completely arbitrary and just a generally-agreed-upon rule of thumb used by people in statistical professions. So much of statistics is like that -- essentially arbitrary rules of thumb chosen because they work well most of the time and you have to choose something, but which it may be appropriate to bend or ignore from time to time if you have a really good reason for it.

Most undergraduate students have a view of math that includes ideas like "2 + 2 = 4, always and everywhere" and see mathematical concepts as being somehow fundamentally, intrinsically true properties of nature. Statistics just isn't like that -- it's a set of powerful but imperfect tools that need to be carefully applied in full cognizance of their limitations and shortcomings. You rarely prove something through statistical analysis, you just arrive at a point where if you have done everything right and applied whatever best practices your field has agreed on you can treat something as being probably true and your colleagues will feel comfortable agreeing with you and using your conclusions as the basis for further work.
posted by Scientist at 9:19 AM on October 27, 2013 [7 favorites]


More Or Less

BBC Radio 4's More or Less is a weekly programme looking at dodgy statistics from all over the media and picking apart the mistaken assumptions underlying them. It's aimed at an intelligent general audience rather than hardcore stats geeks, but interesting nonetheless. Quite a few episodes are available free online at the link above.
posted by Paul Slade at 9:22 AM on October 27, 2013 [8 favorites]


Also I will note that at no point in my undergraduate career did anybody explain to me that p = 0.05 as a cutoff for statistical significance is completely arbitrary and just a generally-agreed-upon rule of thumb used by people in statistical professions.

Maybe it's worth saying that I have a book coming out next spring which hammers this point pretty hard. In fact, the section of the book that talks about statistical inference uses some of the same examples as the link in the OP does (e.g. the dead salmon, the Amgen non-replication study) but I guess this is not surprising as these were pretty high-profile cases that everybody who's been paying attention to these issues lately has heard about. Anyway, the book, if I did it right, actually should be readable by the general public, including (especially?) people who don't find their required math courses particularly fulfilling. (I've taught plenty of those courses, so I know very well that's the case for a lot of people.)
posted by escabeche at 9:30 AM on October 27, 2013 [2 favorites]


Hey, now I have a name - 'pseudoreplication' - for the shit that drives me crazy every single time I read a math education study!

Another term for this is "pen effect" and is a common issue in animal studies where human / bovine / lupine / murine social dynamics play a part. If you have two pens with 10 subjects in each, you have an n of 2 not an n of 20.
posted by Kid Charlemagne at 9:48 AM on October 27, 2013


This recent paper in PLoS Medicine argues a similar point about the uses and abuses of statistics in studies- "Why Most Published Research Findings Are False".
posted by Rufus T. Firefly at 9:54 AM on October 27, 2013


Also I will note that at no point in my undergraduate career did anybody explain to me that p = 0.05 as a cutoff for statistical significance is completely arbitrary and just a generally-agreed-upon rule of thumb used by people in statistical professions.

The problem is that the question of choosing cutoffs is frequently an economic one. There's no rigorous way of selecting a cutoff that derives solely from the statistics, so you end up just choosing something that sounds good. If you can define a utility function and a cost function on the cutoff space, though, the natural choice of cutoff is the one that maximizes your surplus.* It's not a perfect scheme, since in cases where it's hard to quantify utility meaningfully you end up with an arbitrary utility function instead of an arbitrary cutoff, but in general I think that conceiving the choice that way would lead to more thoughtful choices in general.

* My lab did a more rigorous study of this idea in the context of bioassays.
posted by invitapriore at 9:59 AM on October 27, 2013 [1 favorite]


Erm...messed up that link above...this!
posted by Rufus T. Firefly at 10:02 AM on October 27, 2013


"Statistics done wrong"...aptly named, for a site that advocates using p values and doesn't mention Bayesian statistics.
posted by Philosopher Dirtbike at 10:59 AM on October 27, 2013 [5 favorites]


At least you know where they stand on the question of To Bayes or not to Bayes. <--- that is a pdf.
posted by bukvich at 12:07 PM on October 27, 2013 [2 favorites]


Scientist, I've been told by a good friend and colleague that The Cartoon Guide to Statistics is actually an excellent reference. I've been meaning to read it, because I am one of those research scientists who actually never took a statistics course. It's amazing to me how little statistics are emphasized in the hard sciences; they are treated mostly as an afterthought, which is kind of alarming.

Something kind of similar to your thoughts on a step-by-step statistics reference is actually produced related the much-maligned manufacturing methodology of Six Sigma: Minitab. We have this program widely available at my company, and it has flaws, but it does supply the user with many of the statistical tests (P-tests, statistical power, main effects, etc) that are needed for a statistically solid study. These are applied in something of a black-box manner, so if you don't understand the theory you can easily misuse them. But it's a start.

Not sure if something similar is available for open-source systems like R.
posted by Existential Dread at 12:30 PM on October 27, 2013 [2 favorites]


"Sadly, this sort of innumeracy is, if anything, more wide-spread in the general public, which includes administrators and policy makers of all sorts. Decisions are easy and necessary, but well-founded decisions are hard. And, if most of the advisers to those administrators and policy makers are probably only slightly better off, we see one of the roots of our regular policy disasters."

Christ, I only have a passing familiarity with stats from undergrad polisci/social science classes, and I'm amazed at what wild inferences people try to pull out of our data about, say, open and click-through rates. And our backend is set up so we can't do A/B, and it's just infuriating to hear people declaim with such certainty on this stuff; it sounds like Aristotelians dismissing any suggestion that dolphins aren't fish.
posted by klangklangston at 1:35 PM on October 27, 2013 [2 favorites]


What I envision is something that would guide the researcher step-by-step through screening the data, choosing an appropriate analysis, testing the analysis's assumptions, making appropriate transformations and/or standardizations, analyzing the dataset, and interpreting the results. Something low on concepts (because there are already lots of great concept-based resources out there, which this reference would be intended to complement rather than replace of course) and high on concise decision-making and procedural instructions. Ideally it would integrate with a popular statistics package, like R.

It is actually called consulting the Statistician employed by the University as core aid to research. I have blessed his nitpicking, obsessive, nay-saying paper destroying soul (after a bit, of course). If your University does not have one, demand that they hire one.
posted by francesca too at 1:53 PM on October 27, 2013


Rufus T. Firefly: "This recent paper in PLoS Medicine argues a similar point about the uses and abuses of statistics in studies- "Why Most Published Research Findings Are False"."

This study has gotten an obscene amount of press and attention but itself makes several of the errors listed in the FPP and fundamentally fails to support its conclusions. You can follow the whole drama in the literature here,
Why Most Published Research Findings Are False
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

Why Most Published Research Findings Are False: Problems in the Analysis
The article published in PLoS Medicine by Ioannidis makes the dramatic claim in the title that “most published research claims are false,” and has received extensive attention as a result. The article does provide a useful reminder that the probability of hypotheses depends on much more than just the p-value, a point that has been made in the medical literature for at least four decades, and in the statistical literature for decades previous. This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies.Unfortunately, while we agree that there are more false claims than many would suspect—based both on poor study design, misinterpretation of p-values, and perhaps analytic manipulation—the mathematical argument in the PLoS Medicine paper underlying the “proof” of the title's claim has a degree of circularity. As we show in detail in a separately published paper, Dr. Ioannidis utilizes a mathematical model that severely diminishes the evidential value of studies—even meta-analyses—such that none can produce more than modest evidence against the null hypothesis, and most are far weaker. This is why, in the offered “proof,” the only study types that achieve a posterior probability of 50% or more (large RCTs [randomized controlled trials] and meta-analysis of RCTs) are those to which a prior probability of 50% or more are assigned. So the model employed cannot be considered a proof that most published claims are untrue, but is rather a claim that no study or combination of studies can ever provide convincing evidence.

Why Most Published Research Findings Are False: Author's Reply to Goodman and Greenland
I thank Goodman and Greenland for their interesting comments on my article. Our methods and results are practically identical. However, some of my arguments are misrepresented:
Here is that separate paper,
ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE"
A recent article in this journal (Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124) argued that more than half of published research findings in the medical literature are false. In this commentary, we examine the structure of that argument, and show that it has three basic components:
1) An assumption that the prior probability of most hypotheses explored in medical research is below 50%.
2) Dichotomization of P-values at the 0.05 level and introduction of a “bias” factor (produced by significance-seeking), the combination of which severely weakens the evidence provided by every design.
3) Use of Bayes theorem to show that, in the face of weak evidence, hypotheses with low prior probabilities cannot have posterior probabilities over 50%.
Thus, the claim is based on a priori assumptions that most tested hypotheses are likely to be false, and then the inferential model used makes it impossible for evidence from any study to overcome this handicap. We focus largely on step (2), explaining how the combination of dichotomization and “bias” dilutes experimental evidence, and showing how this dilution leads inevitably to the stated conclusion. We also demonstrate a fallacy in another important component of the argument –that papers in “hot” fields are more likely to produce false findings.
We agree with the paper’s conclusions and recommendations that many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims. But calculating the unreliability of the medical research literature, in whole or in part, requires more empirical evidence and different inferential models than were used. The claim that “most research findings are false for most research designs and for most fields” must be considered as yet unproven.
The arguments made do take some amount of statistical understanding to interpret, but the smack down performed was hard enough that, as you can notice perusing Google Scholar, Ioannidis has not continued to publish in this area.
posted by Blasdelb at 5:17 PM on October 27, 2013 [7 favorites]


We agree with the paper’s conclusions and recommendations that many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims

This is precisely the lesson most scientists I know have taken from Ioannidis's paper, not "oh my god everything we say is probably wrong," despite Ioannidis's choice of title.
posted by escabeche at 6:04 PM on October 27, 2013


I think the basic course in college math should be statistics, not algebra or calculus, which are almost infinitely less useful to anyone outside of a STEM discipline (and to many within them). Similarly, statistics should be a much more central subject in high school and even grade school. On the other hand, actually instilling critical thinking in students is a lot less popular than talking vaguely about it, because no one likes being called on their hand-waving....

As sympathetic as I am, what would this course even look like, for a general undergraduate audience with minimal preparation in mathematics?

Most graduate programs in the social sciences either teach one-two courses in applied statistics or send their students to wherever the applied stats program at their school is housed. And from what I've seen, students generally come out of those classes knowing how to push buttons in SPSS or copy-and-paste R commands, but not much better off in terms of basic statistical literacy.

And that, in turn, seems to be because students generally take those classes not because of their love for the inherent truth-finding beauty of statistics, but because they just want a bag of tricks to throw at the data they're collecting as part of their dissertation research.

I should also mention that to correctly deploy various diagnostic tests or, really, do anything beyond throwing the simplest textbook models at their data, students need to have a pretty deep and rich understanding of advanced math. This is one of the reasons why undergraduate probability and statistics is usually offered as a challenging upper-level sequence of difficult, calculus-based courses. And those at best provide the merest groundwork for the simple models that stats classes start off with.

Statistics may seem simple in application, but it's actually a complicated and difficult area of numerical modeling. If it sounds simple, it is probably being misused. Maybe, instead of making everyone rely on their own hare-brained understanding of statistics, we should leave statistics to the statisticians. We don't ask everyone who uses MRI to be an MRI physicist, I don't see why statistics is so different.
posted by Nomyte at 6:37 PM on October 27, 2013 [2 favorites]


Most of the researchers in medicine or sociology I know wouldn't dream of doing a paper with statistical analysis of content without the active assistance-- if not outright collaboration-- of a competent, degreed statistician; in fact, the two I know best happen to have married PhD level biostatisticians.
posted by jamjam at 8:38 PM on October 27, 2013


I took a 3-month course in statistics in graduate school. I learned almost nothing because the statisticians teaching it worked largely with sizable populations, and so were teaching us how to determine the difference between an effect that could be measured in two matched sets of hundreds of replicates; the data I and my classmates worked with was, for example, determining whether the rate of an enzymatic reaction in three microplate wells or even a curve of 7 levels in three microplate wells each was significantly different from that occurring in another set of microplate wells. From what I learned in that class, the data that my entire thesis was based on and every project I've been on since (and which have been published in peer reviewed journals) was completely useless and the statistics on it pointless and inaccurate.

Last year, I was finally taught how to use Minitab and DOE and got a glimpse of how to handle these small-replicate experiments. But seriously, 23 years of school with definite interest and effort at doing the stats right and I am still this uncertain?
posted by Tandem Affinity at 8:39 PM on October 27, 2013 [2 favorites]


Most of the researchers in medicine or sociology I know wouldn't dream of doing a paper with statistical analysis of content without the active assistance-- if not outright collaboration-- of a competent, degreed statistician; in fact, the two I know best happen to have married PhD level biostatisticians.

That's pretty extraordinary. In my experience as a former post-bac research assistant in the cognitive sciences and later an MRI tech, a lot of the actual analysis and the resulting responsibility gets offloaded onto the post-bac assistants and techs. The other alternative is that the primary investigator does it. Or maybe the post-doc. Certainly not anyone with an actual degree in statistics.
posted by Nomyte at 9:19 PM on October 27, 2013


What would be REALLY nice is if there was a website curated by Statisticians that gives, say, a letter grade to the studies that get published in the popular press.

So when you see on CNN that gun-ownership or religion causes Alzheimer's you can beetle over to the website and get the lowdown. Kind of what Snopes does for urban legends, but with statistical debunk.

The problem is most of us don't have the time or background to fully grasp the statistical strength of a particular study.

Here's one example - someone forwarded me this widely reported study that claims that rich people care less about others than poor people do.

What grade should it get and why?
posted by storybored at 8:34 PM on November 8, 2013


« Older Marriage is an insane proposal. [slyt]...   |   The Ohio State University marc... Newer »


This thread has been archived and is closed to new comments