The Speed of Science
January 3, 2022 7:32 AM   Subscribe

Saloni Dattani and Nathaniel Bechhofer on replacing the traditional research paper with something that's living.
Once research is freed from the need to exist in static form, it can be treated as if it were a live software product. Analysis code would be standardised in format and regularly tested by code-checkers, and data would be stored in formats that were machine-readable, which would enable others to quickly replicate research or apply methods to other contexts. It would also be used to apply new methods to old data with ease.
posted by metaquarry (12 comments total) 16 users marked this as a favorite
 
i've been working as a data analyst/engineer supporting academic medical research for about a year now. one of the first things that surprised me is that while journals will often scrutinize statistical techniques employed in a paper to an extreme, the construction of datasets themselves (which is what I do) is totally ignored and treated like a black box. in my area, translation of raw data into research ready datasets involves long conversations between technicians like myself and clinical researchers, and much of the work is highly subjective.
posted by AlbertCalavicci at 7:59 AM on January 3, 2022 [11 favorites]


While research publications are slow inefficient and full of structural issues, the fact that you have to put some fucking effort to bang out a paper is a GOOD thing and gatekeeping serves a purpose too.

Sci-Github/pastebin for low effort junk science barf data dumping swaps one set of problems for another.

What was the title of this website again? Meta-what?
posted by lalochezia at 8:20 AM on January 3, 2022 [3 favorites]


It seems like this isn't an effort to eliminate the writing a paper part of things, but to ensure that when you write a paper, the underlying data itself is available for others to examine, not just the paper. But I'll admit I didn't read the whole thing, either.
posted by jacquilynne at 8:22 AM on January 3, 2022 [1 favorite]


The authors point out serious, well-known problems in scientific research. I wish they provided some answers, rather than this pie-in-the-sky crap.

They correctly note that addressing these issues would require each research group to be "a team of theorists, epidemiologists, statisticians, programmers, Red Team reviewers, code-checkers, managers, writers, copy-editors and communicators."

Yeah. It would. You know who does most of the work of research and paper-writing, now? Grad students and postdocs. Overworked, underpaid grad students and postdocs who need to publish and move on.

You want to add a bunch more labor to that process? The money and career incentives have to come from somewhere. You want to require that labor to be even more specialized and high-skill? Then you can't require students and postdocs to do it.

Incremental improvements like putting code in GitHub are great. Journals can, bit by bit, increase reporting requirements, and they are. But don't talk about an entirely different scientific labor structure as though we can simply walk there from here.
posted by gurple at 8:26 AM on January 3, 2022 [4 favorites]


I will say that while I am very supportive of the 'software style' of research execution and documentation for the benefit of researchers themselves, I'm very skeptical of the extension of this methodology into the domain of so-called "open" science, which ostensibly invites "average people" to participate meaningfully in science. The former is just good practice. The latter is a highly corporatized initiative which in my opinion threatens the independence of scientific research while masquerading as some kind of democratizing force.
posted by AlbertCalavicci at 8:28 AM on January 3, 2022


I’m a bit conflicted when I see articles like this, because on the one hand, I definitely like a lot of the suggestions as general research practices. Open datasets, yes! Please, more robustness checks! Share more of your code and detailed methods, and make it easy to reproduce your results!

On the other hand… I’m skeptical that interactive, live-updated web publications are going to replace journal publications any time soon, because it’s not clear to me that they accomplish the same goals.

With no claim to completeness, afaict journal publications are used to:
  • Build a resume for academic hiring and promotion
  • Establish priority in a research field, to get more citations
  • Communicate results to other researchers within your field
  • Maaaaaybe communicate results to the public, but not at the expense of the goals above
Mostly publications are about advancing a researcher’s career, and also to communicate within their professional community (at a slower pace than conferences or out-of-band emails). They’re a lot of work to write, so they ideally shouldn’t have major problems… but if do, that happens, and my experience is that within research communities they’re generally regarded as part of a conversation, not as a repository of established fact.

Articles like this one often seem like they’re trying to fix the “publications often have problems” issue, and make them friendlier to the public… While not suggesting replacements for the career-advancing goals that these publications generally serve for the people writing them.
posted by learning from frequent failure at 8:29 AM on January 3, 2022 [4 favorites]


Hmmm. Having tried to follow links to data archives from 10 year old research papers. . . treating published results as a live software product sounds like a nightmare. Even if the journal is hosting them, it's unlikely to last longer than one or two editors. The peer reviewed journal model isn't great, but it's better than links to things that are meant to change.

I'm in a weird field where preprints are standard, public "data" is the norm, but turning the hundreds of terrabytes of raw timestream data into data products anybody who hasn't spent three years learning about the details of an experiment would actually want to use is pretty much impossible. Publishing the code used in a paper would be a very good thing, if sometimes personally embarrassing. (Some people do it.) But, realistically, nobody who isn't joining a collaboration as a new postdoc is going to actually be able to do anything with most of that data or code. Even using NASA instrument data with many full-time people devoted to maintaining and documenting archives, actually using raw data is a pain in the neck that takes weeks of work to do meaningfully. Nobody is going to spend time verifying your results unless they have good reason to be skeptical or are using them for something else. Presumably it's different in other fields.
posted by eotvos at 8:30 AM on January 3, 2022


I love MeFi. I really do. Y'all are awesome.

Because hi hi hello, I've been a research data curator/data steward and I teach others to do that work, and it frosts my britches ever so that the work and people involved didn't even get a casual mention in this piece. Data and code don't just become and stay available and useful by magic. That takes actual freakin' work.

And I came here to complain bitterly about that, and what do you know, several MeFites beat me to it.

Adding a bit of nuance to the open science question: In my world, open science isn't the same thing as citizen science, though there's some overlap. Open science is more or less what the paper is advocating for: considerably more transparency into scientific processes, including (where appropriate; it isn't always) project design (e.g. pre-registering studies), data, and code. Citizen science is the Zooniverse- or BOINC-style crowdsourcing of various bits of scientific labor.
posted by humbug at 10:09 AM on January 3, 2022 [8 favorites]


I tried this a few times. It was a lot more work than just pushing out a paper, and pushing out a paper is hard enough. Also, there was no more reward for doing it that way than for doing it the normal way. Also, a lot of the code repos have rotted. I learned a lot about what is and isn't durable in terms of research artifacts, but mostly I learned that when you ask overworked people to do extra work for no benefit then they aren't going to tend to do the extra work.
posted by pmb at 10:25 AM on January 3, 2022 [5 favorites]


Wouldn't this open-up research papers to similar sorts of partisan edit wars that often plagues Wikipedia articles?
posted by Thorzdad at 10:51 AM on January 3, 2022 [1 favorite]


Some comment-enabled preprint platforms (e.g. PubPeer) have had to create some moderation structure, yes. Typically it's on the level of "who's allowed to comment" (for example, to post on arXiv you have to be validated by an existing poster) rather than examining each individual comment. I guess I'd characterize the approach as "crackpot avoidance."

What I don't know is how well it controls for *isms/*misias and academic feuding. The public nature of commenting may well cut down on the worst excesses -- excesses that peer review is unfortunately prone to because it's not public -- but if discourse studies exist on this, I haven't yet run across them.
posted by humbug at 12:35 PM on January 3, 2022


(background - before I sold out to the dark side and went into reinsurance, I worked in bioinformatics - specifically biophysics, multiple sequence alignment, and phylogenomics)

IMO open science is unequivocally a good thing.
  • There's nothing more annoying than a "data available on request" notice only to find that the author has changed jobs and you can't track them down, the server isn't there any more, or that they'll hold out on giving you the data unless you give them a co-authorship for doing nothing.
  • The lead time on traditional publications is crazy. My first paper went to Nucleic Acids Research, which we picked because at 2 months from submission to publication, it was the fastest in the field. The worst are ones that hold you up for a year or so, and then reject your paper as "not novel" because your paper is now a year old.
  • Traditional journals came out of a world where only a small number of papers could be physically printed and posted out once a month or so. This constraint no longer exists.
  • Impact factor is a metric that became a target. Being in Nature instead of PLOS One says nothing about the quality of the work.
  • The code, the code, the code. The majority of science now involves a certain amount of coding, anything from Python notebooks to scalable, fault tolerant web services. Work is not reproducible without it. The knowledge that a stranger might look at your code may (fanciful thinking) help you improve your practises.
  • (more fanciful thinking) In time, the idea of a publication as a living thing might help persuade the people with the purse strings that the work doesn't end when the paper is published. I worked on an incredibly popular piece of software (3,000 citations on Google Scholar on SARS‑CoV‑2 alone since COVID kicked off) and my old boss had his last grant rejected because it was largely for enhancements, support, and bugfixes for it rather than anything shiny and new.
  • Paywalled papers disproportionately hurt scientists in smaller institutions and developing countries, who might not have the resources to have a ton of journal subscriptions.
arXiv/bioRxiv/etc have shown that it's doable - let's do it.
posted by kersplunk at 1:13 PM on January 3, 2022 [2 favorites]


« Older "This book is just sad."   |   You've got to fight for your right to focus. Newer »


This thread has been archived and is closed to new comments