“This Is What a Methodological Terrorist Looks Like”
September 16, 2018 1:53 AM   Subscribe

‘I Want to Burn Things to the Ground’ [The Chronicle of Higher Education] “Just last month the Center for Open Science reported that, of 21 social-behavioral-science studies published in Science and Nature between 2010 and 2015, researchers could successfully replicate only 13 of them. Again, that’s Science and Nature, two of the most prestigious scientific journals around. If you’re a human interested in reliable information about human behavior, that news is probably distressing. If you’re a psychologist who has built a career on what may turn out to be a mirage, it’s genuinely terrifying.”
posted by supercrayon (70 comments total) 41 users marked this as a favorite
 
Publish or perish is the problem. People rush results out or are just pressured into getting that all important publication. Nobody is allowed to sit quietly and think deeply these days. Having a long view and taking time doesn’t get funding.
posted by rivets at 2:22 AM on September 16, 2018 [42 favorites]


First termites, and now this!
posted by Pyrogenesis at 3:29 AM on September 16, 2018 [4 favorites]


Frankly, having worked for a time in the sciences, it kind of amazes me that anything useful comes out of the process at all. Pretty much all the incentives are perverse incentives. "Publishable" and "useful to humanity" are not particularly closely related concepts, and if you're a scientist, your whole career rests on producing a steady stream of publications no matter what. Also, plenty of things are just…dubious…and yet are accepted as gold standard techniques/concepts/methodologies in your field which really nobody has standing to challenge because your reviewers simply won't let you publish unless you use them. It's pretty rotten, or at least that was my experience.
posted by Anticipation Of A New Lover's Arrival, The at 4:30 AM on September 16, 2018 [24 favorites]


What bothers me is that scientific research has become a billion dollar fraud machine that we're paying for so that unscrupulous researchers can pad their resumes and entire industries can line their pockets. Government research grants should demand automated open source access to raw research data. Findings must be published to open journals. Future grants should require that past research findings have been verified by independent third parties. Researchers should be required to have verified knowledge of statistics because holy shit how can you be a professor and allowed to not know these things arrrrrggggghhhhhhhhhh.
posted by Foci for Analysis at 4:32 AM on September 16, 2018 [24 favorites]


Also, psychologists need to have a statistics that is based on the realities of their discipline, and not be using the statistics that was developed for physicists and engineers.

I think this is part of why psychologists and other noisy, difficult disciplines don't incorporate statistics well. The statistics was not made for their discipline and they know it. The math is all wrong, the assumptions are all wrong.

Psychologists should work with mathematicians to produce statistics that can actually help us design good psychological research. You can t measure people's minds with the same rulers you use to measure mass or engineer industrial manufacutring.
posted by eustatic at 4:59 AM on September 16, 2018 [32 favorites]


This reminds me of what I've read about the neutral selection debates in evolutionary biology in the 1970s and 1980s. Apparently, Motoo Kimura would go to conference after conference and abrasively tell other researchers that their adaptationist explanations were bullshit and that their results could be explained just as well by neutral drift.

Kimura took all the fun out of coming up with evolutionary Just So stories, but the theory ended up stronger in the end as a result of his abrasiveness and persistence.

(Is a hostile work environment okay if it leads to better scientific theories? That seems to be the question that the article is playing with.)
posted by clawsoon at 5:14 AM on September 16, 2018 [8 favorites]


What bothers me is that scientific research has become a billion dollar fraud machine

While many things suck, and need to be addressed.... This kind of vast, generalized dismissal is extremely harmful. Please don't do this. It's not only untrue, but you give ammunition to the slew of anti science forces that are increasingly on the rise
posted by lalochezia at 5:20 AM on September 16, 2018 [129 favorites]


eustatic: Also, psychologists need to have a statistics that is based on the realities of their discipline, and not be using the statistics that was developed for physicists and engineers.

Most of the statistical methods used in psychology were developed by and for biologists. (Karl Pearson and Ronald Fisher in particular). Physicists and engineers generally have no need for p-values, ANOVA, null hypothesis testing, chi-squared tests and the rest of the statistical tools developed for biology, since the data of physics and engineering tends to be much cleaner and simpler than that of biology. It's only when you're studying life - human or otherwise - that you need all of those tools to tease out a multitude of unknown confounding factors.
posted by clawsoon at 5:28 AM on September 16, 2018 [59 favorites]


What bothers me is that scientific research has become a billion dollar fraud machine

Just like politics, and economics, and finance... Under capitalism, doesn't everything inevitably become a billion-dollar fraud machine?
posted by Faint of Butt at 5:32 AM on September 16, 2018 [17 favorites]


Green Jelly Beans linked to acne.
posted by Nanukthedog at 5:38 AM on September 16, 2018 [2 favorites]


Is this entirely a bad thing? Other fields have had Cold Fusion, proof of faster than light particles the Rutherford–Bohr model and such. Science needs to have room for mistakes without ridicule. A lot of inspiration is based on imperfect models.

Some fields certainly may need more attempts at replication, they keep trying to disprove Einstein.
posted by sammyo at 5:49 AM on September 16, 2018 [1 favorite]


Under capitalism, doesn't everything inevitably become a billion-dollar fraud machine?

Yup.
Was going to posit this very same observation. Most of this sort of ugliness would magically disappear if the profit motive was exorcised from the sciences. And, pretty much any other endeavor, for that matter.
posted by Thorzdad at 5:53 AM on September 16, 2018 [11 favorites]


sammyo: Other fields have had Cold Fusion, proof of faster than light particles the Rutherford–Bohr model and such. Science needs to have room for mistakes without ridicule.

I remember cold fusion. I dunno if it's a good example of giving room for mistakes without ridicule, because it was met with a huge amount of ridicule. Reputation-destroying ridicule.
posted by clawsoon at 6:06 AM on September 16, 2018 [4 favorites]


The "highly prestigious" journals are known to publish less reliable papers.
posted by runcifex at 6:11 AM on September 16, 2018 [4 favorites]


Other fields have had Cold Fusion

...which dozens of labs immediately tried (and failed) to replicate. This was only met with huge amounts of ridicule because the original authors doubled down and ascribed it to a chemistry-physics turf war rather than acknowledge their mistakes.

proof of faster than light particles

Proof is a very strong word in physics. During its short life, the OPERA result led to some really nice outside work (like the neutrino Cherenkov radiation paper by Cohen and Glashow), and an intense hunt within the collaboration to try to track down any possible spurious source of the signal, which they successfully did within a matter of months.

the Rutherford–Bohr model

What. The Rutherford-Bohr model encapsulated two enormous leaps forward in our understanding of the atom. Without it, subsequent progress would not have been possible.
posted by heatherlogan at 6:26 AM on September 16, 2018 [21 favorites]


Physicists and engineers generally have no need for p-values, ANOVA, null hypothesis testing, chi-squared tests and the rest of the statistical tools developed for biology, since the data of physics and engineering tends to be much cleaner and simpler than that of biology. It's only when you're studying life - human or otherwise - that you need all of those tools to tease out a multitude of unknown confounding factors.

You'd think that, but on the other hand let me introduce you to the Six Sigmas.

Under capitalism, doesn't everything inevitably become a billion-dollar fraud machine?

By contrast, under communism everything inevitably becomes a billion-ruble faud machine.
posted by Huffy Puffy at 6:32 AM on September 16, 2018 [2 favorites]


This is really irritating. Replication is a basic requirement. Cell biology has a replication crisis, too, but it's been pretty clear that a lot of the problem is that people don't (can't?) publish full methods. When folks fail to replicate, it turns out that some of the left out methods were like, important, and the original study didn't find what the authors thought it found. That's a big deal, we want to know that! It really matters if the result is TRUE or NOT TRUE.

Does it not matter if psychology results are true? It seems like it should matter.

The point about the stats is well made, but there ARE stats that work better than a t-test or ANOVA for these things. The departments don't teach stats well enough. This is a problem in my field, too, but it's not a good excuse. The math department is just in another building.
posted by Made of Star Stuff at 6:34 AM on September 16, 2018 [14 favorites]


The argument against SIPS and other such critics seems to be "they're making psychology less fun" and "they're making people trust us less." But fun is very far down on the list of priorities here, well behind "correct." And I'd argue that if people distrust your work because it was proven wrong, the fault is not with those who proved you wrong.

It's just a very defensive reaction. If your work can't be replicated by others then clearly something is wrong. You can be upset by that, but then you need to get over it and fix your work.

The principles that the "methodological terrorists" are advocating seem like really basic scientific principles. Say what you're trying to accomplish with an experiment ahead of time, and share your data when you're done. If your work is sound it will stand up to scrutiny. If it can't, then it should be corrected or discarded. That can be hard, emotionally, but better that than to have people harmed by your bad work.

(There's probably a corollary to Godwin's Law, where the first person to call the other side terrorists has already lost the argument.)
posted by JDHarper at 6:58 AM on September 16, 2018 [21 favorites]


The New Yorker has been doing some reporting on this (for the past decade or so.

The Truth Wears Off, Jonah Lehrer
More Thoughts on the Decline Effect, Jonah Lehrer
The Crisis in Social Psychology that Isn't, Gary Marcus
posted by Miko at 7:14 AM on September 16, 2018 [3 favorites]


The tone of this article is maddening. "Data thugs", really? Is that what a researcher calls a person who suspended their own research to re-examine theirs instead? If they cared about the research instead of their reputation, they'd be grateful for the extra attention to the phenomenon they are studying.

I'm sorry experimental psychology has turned out to be completely riddled with statistical flaws, whether errors made in good faith or outright fraud. But the solution isn't to get mad at the people who are finding the flaws. The solution is to fix the scientific process.

I agree with eustatic that the solution starts with better statistical tools. If a paper relies on any sort of statistics more complicated than an averag, even a p-test, that paper should be required to have a professional statistician as one of the authors who is making sure the the statistics are done correctly. And the entire field needs to get more open about not only sharing the raw data but also the experimental design. Psychology is far too important a discipline to let die because of crappy science.
posted by Nelson at 7:19 AM on September 16, 2018 [10 favorites]


We also need to stop treating positive and negative results differently, and only publishing positive results.

You can do everything right - but if you're working in a larger environment where only positive results get published, you'll never know if someone else has done a similar project that "failed." (And I put failed in quotes there, because it shouldn't be treated as a failure, even though it very much is.)

That can be hard, emotionally, but better that than to have people harmed by your bad work.

It's not so much that it's hard emotionally. Academics are used to harsh criticism. I mean, yes, people can react badly but I don't think that's the main reason for the resistance.

The problem is that it's hard professionally. Essentially, many of the fixes put the onus on individual academics to fight the system that they're in - but as individuals, we have very little power to do things like ... challenge the publish-or-perish environment, or invent funding for replication. Some fixes can be taken on board more easily, but the problem really traces back to perverse incentives, and ... they're really strong incentives, like keep your job you spent fifteen years training for kinds of incentives.
posted by Kutsuwamushi at 7:19 AM on September 16, 2018 [46 favorites]


One more bit of good reporting on reproducibility: The Experiment Experiment, a Planet Money episode.
posted by Nelson at 7:20 AM on September 16, 2018 [1 favorite]


The argument against SIPS and other such critics seems to be "they're making psychology less fun" and "they're making people trust us less." But fun is very far down on the list of priorities here, well behind "correct." And I'd argue that if people distrust your work because it was proven wrong, the fault is not with those who proved you wrong.

It's just a very defensive reaction. If your work can't be replicated by others then clearly something is wrong. You can be upset by that, but then you need to get over it and fix your work.


Honestly, the arguments against SIPS et al. just seem like textbook tone policing to me: they attempt to distract from the validity of a critique by attacking the way in which it was presented.
posted by Johnny Assay at 7:23 AM on September 16, 2018 [4 favorites]


I agree with eustatic that the solution starts with better statistical tools. If a paper relies on any sort of statistics more complicated than an averag, even a p-test, that paper should be required to have a professional statistician as one of the authors

This is such an important point.

Some of the statistical methods that are being used today can be incredibly mathematically complicated. If your field of study isn't statistics, you probably don't know enough about statistics. It's incredibly challenging to develop that expertise - because it really is a second expertise, like getting a second degree. And of course what you're doing is already more than a full time job. I took a year-long, intensive statistics course and I still don't know enough. All of the psychology grads I took that course with ... also don't know enough.

There needs to be institutional support for both developing more statistical expertise, and for collaborating with statisticians. My university has a statistical consulting office that can be great - but they're pretty overworked, and it can be hard to get in to see them.
posted by Kutsuwamushi at 7:27 AM on September 16, 2018 [15 favorites]


The thing I worry about is that there are an awful lot of men going hard after one of the few fields in the sciences where women are normative. And that, distinct from SIPS as a formal organization, there seem to be a pack of men who, to an outsider, look an awful lot like gamergoobers.

Even to the (substantial) extent social psychology has these problems, that doesn't automatically mean that any critic is sincerely motivated by them any more than goobergobbers are sincerely motivated by ethics in gaming journalism.

tl;dr: when especially women say they're experiencing at-least-borderline harassment, I believe them.
posted by GCU Sweet and Full of Grace at 7:29 AM on September 16, 2018 [17 favorites]


With respect to statistics, the fix is generally not anything to do with having better statistical techniques or more complicated mathematical machinery. The biggest problem is the quality of the data. If you have really good data, you (often) don't need sophisticated statistical tools, and if you don't have good data, all the fanciest new statistical modeling won't save you. For a more detailed argument in this direction, take a look at the classic paper by David Freedman on "Statistical Models and Show Leather."
posted by Jonathan Livengood at 7:45 AM on September 16, 2018 [13 favorites]


He spends his days playing basketball, working out, and devoting hours and hours to his unpaid gig as a soldier in the replication rebellion. Not long ago he posted a photo of himself on Twitter lying in bed, with the caption, "Me after a successful night of data thugging."

*swoon*
posted by The Toad at 7:56 AM on September 16, 2018 [1 favorite]


Jonathan Livengood: If you have really good data, you (often) don't need sophisticated statistical tools, and if you don't have good data, all the fanciest new statistical modeling won't save you.

As I understand it, the most important time to involve a statistician is before you start your experiment so that you don't end up with useless data when you're done.

One bit of psychology stats advice I remember reading (which gave me a shudder) was to the effect that you shouldn't gather too much data, because that makes it less likely that you'll find an effect. My impression was that the author of the advice was sincere, and they just wanted to help other psychologists have more success discovering interesting things about humans.
posted by clawsoon at 7:59 AM on September 16, 2018 [8 favorites]


With respect to statistics, the fix is generally not anything to do with having better statistical techniques or more complicated mathematical machinery. The biggest problem is the quality of the data. If you have really good data, you (often) don't need sophisticated statistical tools, and if you don't have good data, all the fanciest new statistical modeling won't save you.

As a working research psychologist, I have to respectfully agree and disagree. The problem is that the data is not just of poor quality, but that it is qualitatively bad, and also usually insufficient.

For example, huge amounts of data in social psychology consists of various flavors of self-report on 5- or 7-point scales. These sorts of data behave differently than those that can be measured precisely (technically, they are "ordinal" rather than "scalar"). Unfortunately, what this means is that the vast majority of "standard" statistical practices (such as ANOVA or multiple regression) are simply not appropriate. Because the data are poor, you absolutely need more sophisticated statistical tools. If, on the other hand, you apply standard tools to these poor data, they'll still give you an answer (because the equations just do what they're told), and the resulting conclusions will convey an inflated degree of confidence.

Part of why ordinal data are so difficult to work with is because each participant is likely interpreting the scale differently. That is: each person who rates something a "3" may have a very different internal state (e.g. a level of emotional intensity) than other people who gave a similar rating. Without collecting lots of data from each participant, it's exceptionally difficult to calibrate the measure. This is why most mental health questionnaires are both lengthy and extensively validated - you just can't get meaningful data otherwise. Since most social psych experiment don't bother do get either the comprehensive depth or the broad population validation for their questions, the results will be highly ambiguous in a way that can be bend to support a wide range of theories. This, in turns, gives the author the flexibility to argue that the data re consistent with their pet theory, without giving consideration to the immense range of alternative hypotheses that would also be consistent with the results.

At the end of the day, the real generational divide is between older psychologists who are computationally locked into a paradigm that predates the personal computer (and thus favored procedures that could be performed by hand, even if they weren't appropriate to the data) vs. a younger generation who see the immense potential for new methods (including inferences that can only be performed with the assistance of computers). As Andrew Gelman wrote of Fiske's unfortunate opinion piece:

"If you’d been deeply invested in the old system, it must be pretty upsetting to think about change. Fiske is in the position of someone who owns stock in a failing enterprise, so no wonder she wants to talk it up. The analogy’s not perfect, though, because there’s no one for her to sell her shares to."

The vast majority of young researchers aren't rallying around their advisors and attacking the new findings. They're doing everything they can to do better science that is more durable and whose long-term viability is more assured. The old guard can rant to the Chronicle about how frustrating it is that the Wild West is being tamed, but if you talk to their students, you won't find nearly as many gunslingers.
posted by belarius at 8:19 AM on September 16, 2018 [39 favorites]


Sure. Everyone would be better off talking to a statistician and getting clear on their questions, design, power, and assumptions up front. But two things: First, as long as your question is pretty standard, you don't really need a statistician in the design phase. There are lots of well-understood experimental designs with easy-to-run statistics that don't have complicated assumptions to check and that are pretty robust to misspecification.

And second, in many cases, what's called for is thinking clearly about the problem you want to solve. No statistics are really at stake. And sometimes, trying to think about the statistics actually muddies things up (as in your shudder-inducing example). If you look at the Freedman paper, you'll see that in the case he discusses at length--Snow's work on cholera--there aren't any statistics at all beyond a simple table. Freedman writes (298):
As a piece of statistical technology, Table 1 is by no means remarkable. But the story it tells is very persuasive. The force of the argument results from the clarity of the prior reasoning, the bringing together of many different lines of evidence, and the amount of shoe leather Snow was willing to use to get the data.
Snow's case was driven by the quality and variety of his data, not by statistical tools.
posted by Jonathan Livengood at 8:34 AM on September 16, 2018 [1 favorite]


There are lots of well-understood experimental designs with easy-to-run statistics that don't have complicated assumptions to check and that are pretty robust to misspecification.

Whether a statistical procedure is easy to run is often quite divorced from whether the assumptions are complicated. Take bog-standard linear regression, for example. In my experience, psychologists simply aren't aware that regression requires exogenous predictors for the resulting parameters to be meaningful. And given that most standard statistical procedures (t-tests, ANOVA, etc.) are just OLS regression in disguise, that's a huge blind spot. That's just one example, mind you; I could name others.

When you speak to psychologists, especially those who teach "experimental methods," there is this common refrain that experimentalists can be kept safe by sticking to "standard experimental designs," which is really just the continuation of Fisher's strategy of writing experimental cookbooks for applied researchers. The immense overconfidence of the old guard rests in no small part on the very assumption you are putting forth about "well-understood" experimental designs.

When experimentalists need to "think clearly about the problem they want to solve," the first problem they actually need to solve is one of measurement, and the second is one of understanding the uncertainty in those measurements. Those are profoundly statistical topics. An experimental design that does not grapple with these difficulties during its infancy will not be able to fix the problem in post.
posted by belarius at 8:52 AM on September 16, 2018 [12 favorites]


Because the data are poor, you absolutely need more sophisticated statistical tools.

I was with you right up to here, and I agree with a lot of the rest of what you're saying. But if you think that Likert data are so bad, then you need to start over and collect better data. Moving to ordered logistic regression or other tools for dealing with ordinal data isn't going to make things substantially better.

I've done some simulation work checking on what misspecification does to the p-values when you (inappropriately) use a t-test on Likert data (as most everybody does). The effect isn't that big, even in bad cases (such as very asymmetric distributions), and there is less absolute difference between nominal and real values when the significance level is setting a higher bar. (For example, if nominal confidence is 95%, the real probability of capturing the truth on repeated samples might be as bad as 89%. When nominal confidence is 99%, the real probability is somewhere around 96%. At least, that's how it has worked for me in simulations where I know what the true values look like and how often the test is giving the right answer.)

To be clear: I like the current revolution. Pre-registration is fantastic! Public data should be the norm! Thinking clearly about statistical tools is very important! I just don't think that the statistical tools are the biggest problem or the first problem to address or even crucial to address in order to have good science.
posted by Jonathan Livengood at 9:05 AM on September 16, 2018 [3 favorites]


The vast majority of young researchers aren't rallying around their advisors and attacking the new findings. They're doing everything they can to do better science that is more durable and whose long-term viability is more assured.

belarius, you've pretty much described all of my peers in vision science. Depending on when people started their graduate training, they may not have been trained with modern ideas of research openness from the get-go (the ones in grad school now; seem to be getting this from day one), but they can all see that the science will improve if we strive for openness. I think this is helped in my research area by a preexisting insistence on good methods sections, but it's still been a shift for a lot of people.

As a bit of an aside, there's a remarkable amount of methodological variability in psychological research; in my neck of the woods, we don't (usually) balk at collecting lots and lots of samples from a single observer, but we're also using fairly small sample sizes much of the time. Hell, my graduate lab (mostly jokingly) measured experiments in kilotrials... and we had a poster asking "Have You Done Your Kilotrial Today?"

At some level, the "data thugs" (oy, am I not fond of that term) are a facet of a larger shift in research towards openness. For example, NIH (the National Institutes of Health, one of the largest research funders in the US) has required that anything they fund, once it's published, be uploaded to PubMed since 2007 or so. That's fine for the stuff that gets published (and there can be a remarkable delay between collecting the data and getting the paper written and through review), but they've been rightly concerned about the stuff that doesn't get published.

There's been a large policy shift for human subjects work at NIH towards requiring funded research to register everything as a clinical trial (through ClinicalTrials.gov), and use that mechanism for making researchers register what they're going to do, and what they found, so that even if they don't publish, other researchers can see what's been done. This is an excellent idea, but there's quite a bit of NIH-funded human research that really, really isn't a clinical trial, and so there's an effort underway to figure out how to mandate openness for basic science that doesn't use tools that really aren't a good fit.
posted by Making You Bored For Science at 9:07 AM on September 16, 2018 [9 favorites]


I think it really comes down to two things: theories written in natural language don't compose that well and consequently everyone overstates the import of their findings because there is no theoretical framework to constrain them.
posted by ethansr at 9:25 AM on September 16, 2018 [2 favorites]


I'm not a psychologist so I may be wrong but I'm not convinced that there is a widespread backlash against the researchers who are now insisting on more rigorous methodologies and open data. I don't know if the Chronicle of Higher Education has changed or I've simply become more aware as I've become older and more experienced but I am wary of the Chronicle because it engages in a lot of muckraking. That may simply be a way of trying to attract more readers and subscribers in a news environment where there is a lot more competition (especially from their former colleagues who now run the free-to-read Inside Higher Ed) but whether it's a new phenomenon or just a newly-noticed phenomenon it's very annoying and raises questions about their journalistic integrity.
posted by ElKevbo at 9:28 AM on September 16, 2018 [4 favorites]


But if you think that Likert data are so bad, then you need to start over and collect better data. Moving to ordered logistic regression or other tools for dealing with ordinal data isn't going to make things substantially better.

In the spirit of talking about whether results replicate, I think there's a case to be made that it will result in estimates that are more honest. If Likert scale data are awful, and an appropriate analysis shows that there is indeed enormous uncertainty about the result, then at least what you report retains the truth of what the data can tell you. The field got itself into this mess in large part by running the wrong stats, getting inappropriately narrow confidence intervals, and riding the resulting overconfidence all the way to tenure.

When it comes to making better measurements, we certainly agree that finding a good alternative to Likert scale data is a step in the right direction. Unfortunately, for most of social psychology, it's not clear what they would be, or that any of the alternatives will be any less ordinal. Self-report ratings on a 100-point scale will still suffer from between-subject calibration difficulties. It's also not clear that neuroimaging is going to rescue the field, as there are a lot of reasons to be worried about how those data are being analyzed as well.

But setting aside the challenges of making scientific inferences about internal states, the deeper problem here is that most psychologists aren't in the business of building rigorous scientific models that make specific predictions. When designing an experiment, most psychologists will settle for a sign test: "This group should be higher than that group." But they don't generally have a model that predicts by how much, or even what baseline performance should be. It's perpetually exploratory, a string of, "Huh, isn't that interest?" sorts of results. The result of this mindset is Mischel's "toothbrush problem," in which everyone comes up with their own pet theories, always vague, and everyone agrees to amiably coexist rather than trying to converge toward a common ground truth.

The relaxed country club atmosphere that this generation enjoyed has now been badly disrupted by ingrates who demand that the Results section of a paper live up to the promises of its Discussion section, and it turns out that not all toothbrushes were created equal.

We need better measurements, absolutely, but to get those we're going to need better models that propose specific mechanisms, as it will be those mechanisms that we can leverage to make more specific predictions against which to compare our more precise measurements. And insofar as there will be problems where the data remains resolutely ordinal, using more appropriate tools at least does a better job of keeping us honest.

I was with you right up to here, and I agree with a lot of the rest of what you're saying.

I suspect we agree on a lot of the particulars, and I don't mean to be difficult. Generally, people who have put in the time to get savvy about stats can agree on how to proceed. The real sticking point may be "What advice would you give to someone who isn't quantitatively savvy but who still wants to do research?" I'm very pessimistic about the cookbook approach, even by well-intentioned researchers, because I think it will generally perpetuate the difficulties the field finds itself in.
posted by belarius at 9:30 AM on September 16, 2018 [9 favorites]


Does anyone else think 13 out of 21 is... not actually that bad? It's over 50 percent, which is more than some other replication projects have found; given the small sample sizes and noisy data that are endemic in psychology I'm actually surprised it's not lower.
posted by en forme de poire at 9:36 AM on September 16, 2018 [11 favorites]


I wonder if the underlying problem with social science studies is that people are just not as good as mice or drosophila in terms of suitability as research materials: sample populations are typically small, variability is difficult to account for, and people may provide fuzzy answers coloured by their own culture, past experiences and experimental conditions. Even if the experiment if well-designed, is it surprising that it is not replicable when using different people, from another place, at another time? The "power pose" study was based on 42 people, probably drawn from the usual WEIRD population. The (in)famous "IQ of French people is falling" study was based on 79 people (out of 68 millions), but that did not prevent the authors from floating the hypothesis that the decrease could have been caused by low-IQ immigrants.
posted by elgilito at 9:39 AM on September 16, 2018 [7 favorites]


sciencing is hard.
posted by nikaspark at 10:06 AM on September 16, 2018 [1 favorite]


As a baseball fan and former psychology major, the tension here reminds me of the tension in baseball with the "statheads" vs. the old school crowd that just wanted to call things based on HEART and GRIT and HUSTLE, and a lot of the same hatred between the two. It's funny to see arguments I remember from the Fire Joe Morgan era coming up in the social sciences of all places. YOU CAN'T PREDICT PSYCHOLOGY! IT CAN'T BE DONE!
posted by Ghostride The Whip at 10:13 AM on September 16, 2018 [11 favorites]


Does anyone else think 13 out of 21 is... not actually that bad?

The standard these studies are claiming to meet is reproducible 19 times out of 20.
posted by Nelson at 10:18 AM on September 16, 2018 [3 favorites]


How CAN you even think of quantifying something that is actually a matter of opinion anyway.
Whether someone is sane or not, for instance, is purely the opinion of other people, who may, or not be sane.
posted by Burn_IT at 10:28 AM on September 16, 2018


The standard these studies are claiming to meet is reproducible 19 times out of 20.

The p-value is not one minus the probability of replication, though. In order to get the number you're talking about you would need to have some kind of prior distribution over the null and alternative hypotheses (probably there is some frequentist way to frame this). Imagine if psychology was a completely null field, like the study of ESP: then actually every result with p=0.05 would definitionally be a false positive and the replication rate would be almost zero. This is Ioannidis's whole thing. And if you assume a priori that most hypotheses are false, setting a p=0.05 threshold gives you a pretty dismal replication rate, potentially much worse than 62 percent. That's why I was surprised it wasn't lower.
posted by en forme de poire at 10:45 AM on September 16, 2018 [13 favorites]


Yeah you're right, my glib snark about 19 out of 20 is statistically wrong. I'm guilty of just the sort of misunderstanding of statistics that's the problem we're talking about here. I don't honestly know what is right. FWIW this roundup of links cites reproducibility rates of 10% and 21% for pharmaceutical and medical studies.
posted by Nelson at 11:35 AM on September 16, 2018 [11 favorites]


A research psychologist here, of the "older generation". A few things on this. First, my field has been moving with near uniformity to statistical analyses that go way beyond ANOVA that use bootstrapping to estimate the likelihood of a difference being real (linear mixed models). Second, most journals now require effect sizes as a minimal measure of variability and there is wide spread awareness that p-hacking (let me keep analyzing my data with new tests to find something significant) and checking your data after X number of subjects and continue running if something is significant (In hopes of getting significance) are not good practices. Third, there is also wide spread awareness (consistent with point 2) that large sample sizes are better.

The pushback on all of this particularly for early career psychologists is that publish publish publish = lots of temptations to do all of the things you shouldn't do. Also, journals (like Science, Nature) are more likely to publish WOWZA THAT'S AMAZING findings in short form articles without room for replications. Both of these things are not good and to some extent the institutions that support psychologists (and any scientists really) like universities and funding agencies have to re-align the incentives because right now the incentives are out of wack with what science should be (slow plodding research and advancement of knowledge).

Having said all of that, statistical training in Psychology is woefully behind the times for many departments with lots of labs (I see this especially in behavioral neuroscience) relying on hypothesis testing t-tests with super small samples. The progress I point to in the first paragraph is coming from Social, Clinical and Cognitive Psychologists. It's unclear to me why someone like Susan Fiske would be so disparaging of those pushing the field in a good direction.
posted by bluesky43 at 11:46 AM on September 16, 2018 [11 favorites]


I wonder if the underlying problem with social science studies is that people are just not as good as mice or drosophila in terms of suitability as research materials: sample populations are typically small, variability is difficult to account for, and people may provide fuzzy answers coloured by their own culture, past experiences and experimental conditions.

The logic behind statistical analysis is that a significant finding can be generalized to the population at large, not simply restricted to the sample used in the study. That said, the kind of things you point to are variables that can be quantified in terms of their contribution to any particular finding using statistical techniques more advanced than ANOVA or t-tests.
posted by bluesky43 at 11:49 AM on September 16, 2018 [2 favorites]


You can't measure people's minds with the same rulers you use to measure mass or engineer industrial manufacturing.

Pffft. Next you'll be telling me that "lack of variation" is a useless metric for evaluating software quality, which would make approximately half the world's ISO 9001 quality certifications a complete waste of time!
posted by flabdablet at 12:21 PM on September 16, 2018 [5 favorites]


Just to riff off of what bluesky43 said: bootstrapping is one of the most useful new(ish) analytic tools we have, because it lets us ask the question "is this result meaningfully different from just randomly resampling the responses we have" - really, is this a real pattern of responses, or is it indistinguishable from noise?

Up until the last 10-15 years or so, it wasn't practical as an analytic method (or, at least, it wasn't seen as practical - it's brute force sampling-with-replacement against the data you have, so it takes time to resample, say, 500 trials from each subject with replacement 1,000-10,000 times), but at this point, it's trivial.
posted by Making You Bored For Science at 12:30 PM on September 16, 2018 [2 favorites]


Cell biology has a replication crisis, too, but it's been pretty clear that a lot of the problem is that people don't (can't?) publish full methods. When folks fail to replicate, it turns out that some of the left out methods were like, important, and the original study didn't find what the authors thought it found. That's a big deal, we want to know that! It really matters if the result is TRUE or NOT TRUE.

This is so true. I work in a core facility (molecular biology) at a state university and trying to suss a complete method out of a paper is part of my job. Often researchers will send us a paper with a method they want to use for their project that is, as written, unreplicable. Sometimes it can be fleshed out by by following a reference, to another reference, to another reference to complete the missing details - but not always.

Specifically pertaining to cell biology, I am amazed at how poorly the recommended guidelines for cell line authentication are being followed, years after the "crisis" of misindentified lines was discovered. For a quick summary, cell lines are cells derived from a single source that are used as a live model to study, usually, certain different cancer types. You may have heard of HeLa cells from "The Immortal Life of Henrietta Lacks". Anywho, one of the things I do at work is "authenticate" cell lines, or prove that the genetic material of the cells match the donor genetic profile. I am constantly telling people that what they thought was a vial of cervical cancer is actually a vial of something else. The "best" journals require that you submit your lines for authentication before publication. Just last week I issued a quote to a customer to authenticate their lines for a paper but then they responded that they no longer needed the service, as the journal they were considering doesn't require it.

If you work with cell lines, please authenticate your lines, especially if you aren't ordering a line directly from a well known source e.g. ATCC. I would *never* just accept a legacy vial from a fellow researcher, or one that has been in your lab for ages. Get it checked! You don't have to use one of the expensive services, you can do the PCR yourself.
posted by lizjohn at 1:04 PM on September 16, 2018 [13 favorites]


I just wanted to meditate on the implications of this sentence:
"Our literature is packed with unreliable findings,"
posted by doctornemo at 1:06 PM on September 16, 2018 [1 favorite]


When I'm reviewing, the first thing I look at in detail is the methods section, because if I can't figure out what the authors did, I have no way of knowing whether their results are even plausible. Why yes, I am the reviewer who will pick apart your methods section, and will opt for more detail rather than less. Hell, if I'm reading a paper, I'll probably spend more time on the methods, because they tell me whether the authors knew what they were doing.

Journals which cap methods sections at a given number of words drive me absolutely up the wall, because if the methods aren't comprehensive, I can't assess the rest of the paper. This is a huge problem with high-impact journals, much less so with smaller-audience journals, who (at least in my neck of the woods) are happy if your methods section is long and detailed.
posted by Making You Bored For Science at 1:18 PM on September 16, 2018 [9 favorites]


The logic behind statistical analysis is that a significant finding can be generalized to the population at large, not simply restricted to the sample used in the study.

The fundamental question is still whether statistical inference is possible in the first place, and whether the authors have taken steps to make sure that it is. No amount of stastistical wizardry will fix that. A scientist working on an experiment with pigs, for instance, will choose a group of pigs of the same breed, sex, age range, weight range, etc. Inference is possible because those variables are controlled so we can infer that results are applicable (at least) to pigs similar to those used in the experiment. Later experiments may test other breeds, weights etc. In the case of the "power pose" study, there's no indication whatsoever on the subjects, even though it can be assumed that the very notion of a "power pose" is highly cultural (and physical) and thus subject to individual variable reactions driven by factors not controlled in the experiments. In the case of the IQ experiment I cited, the authors discuss the potential effect of ethnicity on IQ in the French population but do not provide data about the ethnic composition of their sample. It's as if they'd done measurements on a small group of random pigs, extrapolated their findings to all pigs, and then discussed the possible effect of pig hair colour without having bothered to look at the colours of their test pigs. It's not a statistical issue, it's just bad science.

This old(ish) paper makes the point that statistical inference, in the case of social sciences, can be problematic (note: the issue not limited to social sciences)

The use of inferential statistical techniques to much social science research data can give an apparent quantitative validity to the conclusions which is, in fact, spurious because
(a) the structures of the populations involved are in reality extremely complex.
(b) the individual members of the populations are themselves very complex. It is extremely difficult to design a measuring instrument (in the form of a questionnaire) which would elicit a required response even if the responders were automata programmed to give such replies. If we assume the existence of such an instrument, we have to remember that the responders are human beings with wills and minds of their own and there is no guarantee that a correct response would even then be obtained. Both these sources of error lead to an unquantifiable uncertainty in the sample statistic (mean, proportion, etc.) from which the inference is to be drawn.
(c) the populations under investigation are not stationary, and the parameters being evaluated will be subject to short-term fluctuations and long-term trends.

posted by elgilito at 2:39 PM on September 16, 2018 [8 favorites]


Bootstrapping, cross-validation, and permutation tests are great if you have enough data, but they can also be a little deceptive in that they still make assumptions about your data that may or may not be true. If there are sources of systematic variation in your data that you don't know about or forget to model, you can still be misled by resampling because you're treating things as independent and coming from the same distributions that may not be in real life.

I'm in genomics, not psych, but from the outside it seems like bigger problems than naive distributional assumptions are: small expected effect sizes, small sample sizes, signal-to-noise, and population heterogeneity. Out of those, my guess is that scaling up sample size would go a long way. As a comparison, GWA studies famously routinely uncover large numbers of tiny effects for complex disease associations and have trouble explaining all or even a large part of the heritability, but the associations they find are actually quite reproducible. Partly the theory and causality are just less ambiguous in genetics than social psych, but also, the very large sample sizes that are increasingly common (n > 10,000) in GWA studies can mitigate a lot of other problems, e.g., by allowing the estimation of population structure.
posted by en forme de poire at 2:44 PM on September 16, 2018 [5 favorites]


The standard these studies are claiming to meet is reproducible 19 times out of 20.

Believing that's what p<0.05 means is absolutely part of the problem.
posted by mark k at 2:58 PM on September 16, 2018


Dance of the p-values is a fun video about how likely you are to get similar p-values if you exactly replicate an experiment on exactly the same population.
posted by clawsoon at 3:23 PM on September 16, 2018 [3 favorites]


Gelman had an interesting post in 2016 about why the replication crisis has centered on psychology. He suggests that it's a combination of strengths and weaknesses in the way that psychology is done:
1. Sophistication: Psychology’s discourse on validity, reliability, and latent constructs is much more sophisticated than the usual treatment of measurement in statistics, economics, biology, etc....

2. Overconfidence deriving from research designs....

3. Openness. This one hurts: psychology’s bad press is in part a consequence of its open culture....

4. Involvement of some of prominent academics....

5. Finally, psychology research is often of general interest....

What do you get when you put it together?

The strengths and weaknesses of the field of research psychology seemed to have combined to (a) encourage the publication and dissemination of lots of low-quality, unreplicable research, while (b) creating the conditions for this problem to be recognized, exposed, and discussed openly.
(I've snipped a lot; the whole thing is worth a read.)
posted by clawsoon at 3:29 PM on September 16, 2018 [11 favorites]


Thanks for posting that clawsoon. I read it at the time but on a re-read it's more relevant than ever. The quote below has resonance. It points to another part of the problem - journals insist (many journals) on strong theoretical statements about data that don't deserve strong theoretical spin.

It makes sense for psychology researchers to be embarrassed that those papers on power pose, ESP, himmicanes, etc. were published in their top journals and promoted by leaders in their field. Just to be clear: I’m not saying there’s anything embarrassing or illegitimate about studying and publishing papers on power pose, ESP, or himmicanes. Speculation and data exploration are fine with me; indeed, they’re a necessary part of science. My problem with those papers is that they presented speculation as mature theory, that they presented data exploration as confirmatory evidence, and that they were not part of research programmes that could accomodate criticism. That’s bad news for psychology or any other field.
posted by bluesky43 at 4:14 PM on September 16, 2018 [3 favorites]


but from the outside it seems like bigger problems than naive distributional assumptions are: small expected effect sizes, small sample sizes, signal-to-noise, and population heterogeneity.

In my world of psychology, naive distributional assumptions are no longer tolerated and great care must be taken to justify claims about distributions. I'm not sure what you mean by signal-to-noise in this context but the others in your list have moved into more focus with statistical tests that permit modeling of data, not just description, and these demand lots of data. The issue of population heterogeneity is tougher because it's very hard to get a handle on individual differences (it's harder than you might think if you haven't worked with data generated by a random sample of people) and some of the metrics can begin to seem like handwaving (or attempts to soak up more variability by throwing everything into the model).
posted by bluesky43 at 4:20 PM on September 16, 2018


Yeah signal to noise was imprecise on my part, I really meant something more like the problem Gelman described as "weighing a feather while the kangaroo is jumping," which I guess kind of emerges from small effect sizes plus unmodeled confounders with relatively large effect sizes, which themselves are really variable according to substructure/time/etc. Population structure definitely seems much easier (though obviously not trivial) to measure in population genetics than in psychology, just because of the way recombination works. There are also things like surrogate variable analysis in *omics that could maybe see some broader use, but those could also certainly be criticized as "attempts to soak up more variability." I'd be curious to hear what people are doing in that area.
posted by en forme de poire at 5:12 PM on September 16, 2018


I wonder how much damage has been done by the n=30 rule of thumb. If the n=1000 rule of thumb had been pushed instead, I wonder if a) the resulting data would've been better, and b) there would've been less of the "toothbrush problem" that belarius mentions above ("theories are like toothbrushes; nobody wants to use someone else's") since psychologists would've been forced to cooperate more often in order to complete their studies.
posted by clawsoon at 5:21 AM on September 17, 2018


Kimura took all the fun out of coming up with evolutionary Just So stories,

If only. At the boundary of the humanities and popular science writing and pseudoscientific bullshit, such stories are the coin of the realm to this day.
posted by spitbull at 5:40 AM on September 17, 2018 [3 favorites]


Also, I’m modestly known among some psych colleagues for this bon mot: “experimental social psychology provides us with an excellent view of the cultural biases of white American college students who are well enough off to go to college but needed the $10 and the free sandwich.”
posted by spitbull at 5:42 AM on September 17, 2018 [17 favorites]


Data thug lyfe
posted by Damienmce at 7:28 AM on September 17, 2018


Part of why ordinal data are so difficult to work with is because each participant is likely interpreting the scale differently. That is: each person who rates something a "3" may have a very different internal state (e.g. a level of emotional intensity) than other people who gave a similar rating. Without collecting lots of data from each participant, it's exceptionally difficult to calibrate the measure. This is why most mental health questionnaires are both lengthy and extensively validated - you just can't get meaningful data otherwise. Since most social psych experiment don't bother do get either the comprehensive depth or the broad population validation for their questions, the results will be highly ambiguous in a way that can be bend to support a wide range of theories.

I guess I'm a bit late, but this is really wrong. All of these problems with subjects interpreting the scale differently are present with any kind of judgment task, whether scalar or ordinal -- if anything the analytical problems are worse for scalar data because the possible kinds of scale perturbations that subjects are using can in principle be very complicated, because they're happening in a continuous space. Ordinal data needs different analysis tools not because it is inherently worse, but because it is a qualitatively different kind of data and you need some model that doesn't assume the data points are drawn from a continuous distribution (which most easy-to-use regression packages most easily do assume, or at least have done in the past).
posted by advil at 8:50 AM on September 17, 2018 [2 favorites]


Also, I’m modestly known among some psych colleagues for this bon mot: “experimental social psychology provides us with an excellent view of the cultural biases of white American college students who are well enough off to go to college but needed the $10 and the free sandwich.”

At my college all psych majors were required to participate in at least 2 or 3 studies as a requirement to complete the major, so let's add that detail to this description
posted by obliterati at 8:51 AM on September 17, 2018 [2 favorites]


At my college all psych majors were required to participate in at least 2 or 3 studies as a requirement to complete the major, so let's add that detail to this description

Back in my undergraduate psych days (2003-2007) I heard many an argument that doing social experiments on students was *of course* inherently generalize-able to the world at large.

This may be slightly off the topic of reproducibility, but my psychology degree gave me a much better education on experimental design, statistics, and writing scientifically, than my degree in molecular and cellular biology. I think bio students, as well as other STEM-y majors, would have a better time reading and comprehending papers if they didn't have to boot-strap their knowledge of experimental design.
posted by lizjohn at 11:21 AM on September 17, 2018 [4 favorites]


Take bog-standard linear regression, for example. In my experience, psychologists simply aren't aware that regression requires exogenous predictors for the resulting parameters to be meaningful.

I know the thread has moved on, but I wanted to come back to this. There is a surprisingly large amount to say here, and it's not obvious where to start. But let's start with a potential confusion about how "exogenous" and "endogenous" are used. The most natural reading of those terms is with respect to a graphical causal model. A variable is exogenous if and only if it has no parents, i.e. no causes in the model. A variable is endogenous if and only if it has at least one parent, i.e. it has at least one cause in the model. That also fits with the Greek meanings, and so one might think that the exogeneity constraint is about the predictor variables being uncaused in the model. But the exogeneity constraint says something a bit different. It might better be labeled the conditional mean zero constraint, since what it says is that the conditional expected value of the errors given the predictor variable is zero, i.e. E[u | X] = 0 in a linear model where Y is a function of X, and u is the error term. The conditional mean zero constraint can fail even when the predictor variable is exogenous. For example, if we are implicitly conditioning on a collider (a variable that is caused by both the predictor variable X and the response variable Y). And the conditional mean zero constraint does not require that the predictor actually be exogenous. It just needs to be uncorrelated with the error. Suppose we have a "full" causal model according to which Z → X → Y. Then X is not exogenous in the model, but the conditional mean zero constraint will still be satisfied with respect to X and Y.

Having clarified what exogeneity amounts to here, I now want to contest the claim that we need exogenous predictors in order to have meaningful parameter estimates. We don't. The conditional mean zero constraint is required if you want to give your parameters a straightforward causal interpretation. But that's not the only thing you might want to do with a linear model. For example, you do not need the conditional mean zero constraint in order to use a linear model for prediction. What you do need is the assumption that the causal system about which you are making predictions is the same as the one from which you sampled. In the opening example in the video you linked, this assumption amounts to saying that the shooter (or group of shooters) firing in the period when data are collected is still going to be shooting in roughly the same way in future clay pigeon launches. As long as the causal system remains constant, the estimated parameters will be fine for the purpose of prediction. Again, in the clay pigeon case, as long as the causal system doesn't change, the sound of a gunshot is actually a good basis for predicting the destruction of the clay pigeon, and the parameter estimated will tell you how to make good predictions (according to the standard square-error loss function). For more mathematical detail and discussion of causal interpretation versus prediction, take a look at this stackexchange discussion.

Anyway, at first, I thought the video was going to be just fine, since the guy in the video starts out clearly making claims about causal inference. But later on, he says things that look really badly wrong. The most important mistake comes around about eight and a half minutes in, when he says that there is no correlation between the sound and the shattering. There definitely is one. The correlation is "spurious" in the sense that it isn't underwritten by direct causation between the correlated variables. But that doesn't mean that there is no correlation. What is true is that there is no conditional correlation given the gunshot. That is, Pr(shatter | sound, shot) = Pr(shatter | shot). He doubles down on this mistake in an especially egregious way at 12:34, where he complains about researchers (like me!) who say (correctly!) that an observed correlation might be explained by various different causal mechanisms. He says that this is wrong. That the correlation is not "true." And he supports that contention by observing that two variables might influence each other and have different "true" correlations (even different signs) in different directions. It's true that one causal mechanism that could explain an observed correlation is mutual influence (provided we're very, very careful about what mutual influence amounts to). And it's true that the estimate you get from simple regression won't tell you what that mutual influence looks like: that is, it won't directly estimate the causal influence in each direction individually. But so what? Those facts don't make the observed correlation any less of a correlation. And they don't make that correlation any less useful for predictions under the assumption that the sampled system is the same one we're making predictions about.
posted by Jonathan Livengood at 9:53 AM on September 18, 2018 [2 favorites]


Not everyone thinks it’s a huge crisis, or even a crisis at all. I spoke with several researchers who complain that the real problem is the replication movement itself. They say a field that once seemed ascendant, its latest findings summarized in magazines and turned into best sellers, is now dominated by backbiting and stifled by fear. Psychologists used to talk about their next clever study; now they fret about whether their findings can withstand withering scrutiny. "You have no idea how many people are debating leaving the field because of these thugs," a tenured psychologist, the same one who calls them "human scum," told me.
I'm occasionally bit sympathetic to the feeling that some people would have where they've really followed "standard practice" and then they get hammered because their study happened to catch the attention of a few colleagues. And because of the world we live in, I'm sure part of the reason Amy Cuddy is a prominent poster child for this sort of thing is because of misogyny (as in, silly woman doing touchy-feely stuff 'of course' can't be scientifically rigorous.)

But the attitude in the quote above is still really entrenched. They define "ascendant" as a time not when their findings were correct and reproducible but when they got stuff in magazines and best sellers. People are going to be bitter.

Was going to posit this very same observation. Most of this sort of ugliness would magically disappear if the profit motive was exorcised from the sciences. And, pretty much any other endeavor, for that matter.

When you are talking about academics studying experimental psychology and deciding the issue is obviously "capitalism" you might want to re-examine the chain of reasoning that got you there. I would say almost nothing would disappear, and definitely not magically, if you took money out of the picture.

People want recognition and respect from others, but more important they want to feel smart about themselves, relatively few want the sum total of their professional contributions to be "I had all these hunches and they all turned out to be wrong, but I'm good enough at statistics that I *proved* I was wrong."

Feynman's old saw is that the scientific method exists to help keep you from fooling yourself. I'd bet money Wansink is a hustler indifferent to the truth but I'm pretty certain Cuddy still pushes her stuff because she thinks she found something true.
posted by mark k at 10:08 AM on September 18, 2018 [2 favorites]


This may be slightly off the topic of reproducibility, but my psychology degree gave me a much better education on experimental design, statistics, and writing scientifically, than my degree in molecular and cellular biology. I think bio students, as well as other STEM-y majors, would have a better time reading and comprehending papers if they didn't have to boot-strap their knowledge of experimental design.

Yeah, "traditionally" stats (including experimental design) is really undertaught in molecular biology, I think because there was an assumption that well-designed experiments should produce essentially binary results. Like, is there a band on this gel or isn't there, or does the embryo have a stripe in a certain place or not, or whatever. I'm inclined to blame Platt's "strong inference" dogma for the widespread derision towards quantitative methods in molecular biology, since it really emphasized a kind of proof-by-contradiction approach and insisted that the strongest evidence was qualitative, not quantitative.

I think that's changing now because people use more high throughput tools, but even pretty recently my grad department was pretty split between people who "believed in math" and people who didn't. I vividly remember one of my professors in grad school, a dev/genetics guy on team no-math-ever who shall remain nameless, not being able to grok the concept of a null distribution or a base rate and treating us like idiot children for even asking about it.
posted by en forme de poire at 10:32 AM on September 18, 2018 [4 favorites]


Social sciences, particularly psychology, have a much more serious problem than reliance on p-values. When some of the most relied upon studies, such as the Stanford Prison experiment, or the Milgram Obedience to Authority experiment are shown to have withheld critical information from the published findings, that is a science in crisis. Turns out the Stanford Prison experiment was more structured than was documented to create a prisoner/guard atmosphere, and that the Milgram experiments had significant resistors who were not reported. So it is not human nature to just do as one's told, one can rely on one's own humanity, nor to become a sadistic gaurd just because you are in that position.

These are important anti-findings because they give us a different idea of what it means to be an ethical human in the modern age. Also, because of the Stanford Prison experiment being regarded as good science, guards and jailers who took those jobs with the goal of being sadistic and cruel were hidden as "just behaving normally." What amount of cruelty in our prisons and jails has been allowed to continue because we were taught to think, "Oh, that's just how people socialize to behave in those situations."

The underreporting of data findings has real world costs and consequences in many cases. Full release of data sets, with appropriate safeguards for personally identifying information would go far to limit the bad data out there.
posted by drossdragon at 12:32 PM on September 19, 2018 [2 favorites]


« Older Shining, gleaming, streaming, flaxen, waxen   |   I'm just popping out for a moment Newer »


This thread has been archived and is closed to new comments