I bet she read Ranger Rick as a kid
May 14, 2020 6:24 AM   Subscribe

 
This was fascinating, thank you. Years of using the clone tool in Photoshop made spotting the duplicates in the examples pretty easy - but that's after being told they were there. I can't imagine trying to spot them, cold, out of hundreds of others.
posted by stillnocturnal at 8:03 AM on May 14, 2020 [5 favorites]


There are services out there that do clone spotting for copyright protection, and they can do a decent job even on heavily photoshopped images. It’s surprising to me that folks aren’t automating this task against scientific literature.

My personal experience with this came when I was on the PC for a conference and received a submission using images and quite a bit of text from one of my own papers. The student had done a bang-up job copy-pasting from three or four unrelated papers, with no attempt to change text or data.
posted by simra at 8:19 AM on May 14, 2020 [4 favorites]


> There are services out there that do clone spotting for copyright protection, and they can do a decent job even on heavily photoshopped images. It’s surprising to me that folks aren’t automating this task against scientific literature.

From The Fine Article:
Current technology is good at detecting outright duplications, and flipped or rotated copies, says Bucci. His company, Resis, uses proprietary software to scan scientific manuscripts for its clients, which include journals and research institutions. But complex problems are tougher, such as two images that share a small overlapping area, but are otherwise completely different. Advances in machine learning could be the key to detecting these and other subtle patterns automatically, he says.

But better software will need more data. [...] Until recently, Bik was unimpressed by the software available. Now, she says, “I have full confidence that in the next two years, computers will be usable as a mass way of screening manuscripts.” But both Bik and Acuna say that people will always need to check the results of such programs, especially to weed out instances where images can and should look similar in certain parts.
So, they are automating, but the tech isn't at the point where it can replace an expert of her caliber. Even if it does get to the point where it can replace the initial triage, people like her will be needed to train those systems and confirm their findings manually.
posted by tonycpsu at 8:33 AM on May 14, 2020 [14 favorites]


The scientific-culture side of this is really interesting. When is it appropriate to flip from "the reputation of a scientist is crucially important, so we must be extremely careful to even suggest a hint of misconduct without firm evidence lest we destroy an innocent person's career" to "a lot of misconduct is going undetected, to the detriment of science as a whole, so we must get all the hints out there so we can track down as much misconduct as possible"?
posted by clawsoon at 8:38 AM on May 14, 2020 [5 favorites]


Just here to stan for Ranger Rick.
But Bik posts her finds almost every day on Twitter and other online forums, in the process teaching others how to spot duplications and pressuring journals to investigate papers.
For anyone else who wants to follow along with her on Twitter or her blog. This was so fascinating, thank you. As with many people who quit a high status job to go into unpaid "integrity work" (or similar) I am super curious how they make ends meet. It may just be that having a real job for a long time lets them take time off.
posted by jessamyn at 9:39 AM on May 14, 2020 [10 favorites]


"the reputation of a scientist is crucially important, so we must be extremely careful to even suggest a hint of misconduct without firm evidence lest we destroy an innocent person's career" to "a lot of misconduct is going undetected, to the detriment of science as a whole, so we must get all the hints out there so we can track down as much misconduct as possible"?

It's worth noting that a significant percentage of what she's catching are sloppy mistakes, not deliberate fraud. Not all corrections to the scientific literature should be career destroying.

From the article: Some authors replied swiftly on the site to point to honest errors. In one case, apparent duplicate images were in fact supposed to represent the same experiment but were not clearly labelled as such, an explanation that Bik accepts. In another, authors posted raw data and said the data seemed similar only after being processed for a paper. In still others, authors said there had been accidental mistakes, and by May this year, 13 of the flagged papers had received corrections
posted by justsomebodythatyouusedtoknow at 9:45 AM on May 14, 2020 [6 favorites]


Diminishing the credibility of scientific research by using doctored images is just the sort of thing that leads to conspiracy theories. May as well fudge the results data while you're at it.

This fall down the rabbit hole seems to be accelerating...
posted by Chuffy at 9:56 AM on May 14, 2020


If people stop getting so defensive about mistakes and start copping to them, then they're not career-ending unless they're obviously not-mistakes or certain scientists are making a suspicious number of "mistakes".

We as a society need to stop being so hesitant to admit fault.
posted by explosion at 9:57 AM on May 14, 2020 [10 favorites]


Yeah, I'm 100% for punishing fraud, but "paper blindness" is a thing. It can take months or in some cases even years to prepare a paper, and it's a massive pain in the ass. You read the same manuscript everyday for so many days in a row while editing and updating and sharing versions with collaborators, you start getting "blind" to mistakes that would otherwise be obvious. Maybe 7 drafts ago, you labeled something as figure 1, but then you moved it to figure 3, and you stop actually parsing the figure because you've seen all the figures so many times, and whoops.... It's definitely not "okay" - we all try very hard not to do this, but it's not always fraud.

Anyone who has been up at 3 AM making sure that the 600 dpi .eps version of that 2#$!#$@#$ figure 4 that you originally made in powerpoint actually embeds and is readable in the journal's goddamn low-res pdf translation of your submission which you really want to get rid of because its been haunting you for the last 8 months and you promised your kids that you definitely would 100% be done tomorrow so that you could spend more time with them probably has some empathy for the authors that do slip up.

(Scientists! Often reasonably smart people who are nonetheless learning graphic design and publishing on the fly without training!)

Anyway, Elizabeth Bik seems pretty awesome and the article is great - I'm glad they're featuring her.
posted by BlueBlueElectricBlue at 10:00 AM on May 14, 2020 [20 favorites]


1. I spelled her name wrong, unintentionally demonstrating my point. Sorry, Elisabeth Bik.

2. It gets easier to admit mistakes if you don't have to worry that people will automatically assume you were aiming for fraud.
posted by BlueBlueElectricBlue at 10:11 AM on May 14, 2020 [2 favorites]


I love following her on Twitter. I am much better at spotting the similarities in blots than in smears, even though I have a lot more work experience with smears.
posted by hydropsyche at 10:35 AM on May 14, 2020 [1 favorite]


On the one hand, I'm really glad that results in my field tend not to rely much on images of the sort that are being discussed in this article.

On the other hand, results in my field rely heavily on bespoke data analysis code, mostly written in Matlab by graduate students and postdocs who had little or no background in programming before they started grad school. On the few occasions when I've seen others' analysis code I often find bugs, include in code that has been used to produce results for papers. So far I've haven't seen anything that would actually alter the findings and require a correction or retraction, but that's been a matter of luck more than anything else. Sadly I don't think there can be an Elisabeth Bik for this problem; the nature of the analysis pipeline means that there isn't a final image containing some subtle fingerprints of errors or manipulation that a skilled eye could detect. The best solution is probably to ask that people publish their original, unprocessed data along with their results, which is a movement that fortunately does seem to be picking up some momentum. But even with that, checking the integrity of published results probably means more or less re-running all of the analyses, which is something that's never going to happen at scale.
posted by biogeo at 11:17 AM on May 14, 2020 [7 favorites]


Double.

(just kidding)
posted by srboisvert at 11:33 AM on May 14, 2020


This is awesome, I am now following her on Twitter.

I am also having loads of fun playing "spot the duplicate." Thank you for posting!
posted by blurker at 3:18 PM on May 14, 2020 [1 favorite]


She's amazing, I followed her for a while and just by following along as she pointed out duplications I learned a lot.
posted by dhruva at 5:54 PM on May 14, 2020



(Scientists! Often reasonably smart people who are nonetheless learning graphic design and publishing on the fly without training!)


Ha
posted by eustatic at 6:56 PM on May 14, 2020


She can tell from the pixels, and from having seen quite a few 'shops in her time.
posted by Pruitt-Igoe at 8:20 PM on May 14, 2020 [6 favorites]


> It’s surprising to me that folks aren’t automating this task against scientific literature.

You're welcome to try (seriously! I'm a scientist and I would love for this to exist), but this is a hard problem. I also have to laugh because a different person on Twitter recommends this to Bik pretty much every day.
posted by MengerSponge at 9:11 AM on May 15, 2020 [3 favorites]


It sounds like there are way too few examples to properly train an AI to do this.

...which sounds like a job opportunity for a scientists who has been convicted of misconduct. "As your punishment, we need you to generate a few thousand doctored images. Please be as creative as possible."
posted by clawsoon at 9:44 AM on May 15, 2020 [1 favorite]


« Older Watch out, here I come!   |   Thrilling vehicular action at 3 mph. Newer »


This thread has been archived and is closed to new comments