Honest as the data is wrong
July 15, 2023 2:32 PM   Subscribe

Professor Francesca Gino, "who focuses on why people make the decisions they do at work" is on administrative leave from Harvard Business School and had some papers retracted because it looks like data were faked. Guardian. Or with more detail: Vox. Irony alert: the seemingly dishonest research was about how honest are regular people.

This came at me through a 20 minute YT Exec Summary. The details of the take-down are in four consecutive posts on DataColada:
[112] Data Falsificada (Part 4): "Forgetting The Words"
[111] Data Falsificada (Part 3): "The Cheaters Are Out of Order"
[110] Data Falsificada (Part 2): "My Class Year Is Harvard"
[109] Data Falsificada (Part 1): "Clusterfake"
That's a lot of editors and peer-reviewers who fluffed their job. Internally interesting is that #109 concerns the 2012 Gino paper co-authored with Dan Ariely which was flagged and discussed on MeFi in 2021.
posted by BobTheScientist (48 comments total) 22 users marked this as a favorite
 
Why do people always complain when true subject matter experts do the research?
posted by Tell Me No Lies at 2:55 PM on July 15, 2023 [10 favorites]


Why do people always complain when true subject matter experts do the research?
posted by Tell Me No Lies


Eponysterical?

Also, can we propose the "Gino Coefficient" as the statistical difference between the actual and falsified data in any published paper?
posted by snofoam at 3:05 PM on July 15, 2023 [21 favorites]


I didn't realize that "every accusation is a confession" was going to extend into academic research, but here we are.
posted by hippybear at 3:09 PM on July 15, 2023 [12 favorites]


Most recently, she authored “Rebel Talent: Why it Pays to Break the Rules in Work and Life.”

To be fair, I don't think there's any way anyone could have predicted this.

Also, can Seth Meyers please take A Closer Look at her data and also play her in the movie?
posted by snofoam at 3:45 PM on July 15, 2023 [6 favorites]


The most amazing part is in the first post (109): (bold and italics theirs)

Two different people independently faked data for two different studies in a paper about dishonesty.

Its good work exposing the fakery, in each post. I never knew that an xlsx file is a zip file.
posted by Superilla at 3:49 PM on July 15, 2023 [8 favorites]


I had a really strong sense of déjà vu like, wasn't there already a professor with an Italian name who has been accused of research shenanigans? I think I may have been thinking of Lorenza Colzato, but I feel like there was another that made it into a smalltime viral Twitter thread, involving something made up about some old manuscripts? And that one's husband was super aggro in defending their spouse, IIRC
posted by coolname at 4:08 PM on July 15, 2023 [1 favorite]


The last big one I remember was Brian Wansink, who was encouraging his grad students to squeeze as many papers as possible out of watching a few people eat pizza. (Or something like that. Food, anyway.)
posted by clawsoon at 4:39 PM on July 15, 2023 [4 favorites]


Note to self: After faking data, resort and export to csv.
posted by Flunkie at 4:39 PM on July 15, 2023 [9 favorites]


... then import back to xlsx.
posted by Flunkie at 4:42 PM on July 15, 2023 [2 favorites]


Note to self: When looking for data-faking, watch for suspicious lack of history in xlsx file.
posted by clawsoon at 4:45 PM on July 15, 2023 [19 favorites]


I had a draft of this queued up, but this one is better done. I was also glad to see the forensic work laid out like that, it was really informative.

To be fair, I don't think there's any way anyone could have predicted this.

Yeah, there’s a tremendous amount of hot-dog-guy, we’re-all-trying-to-figure-out-who-did-this energy to this whole situation. You have to believe that a lot of people at the periphery of this knew something sketchy was going on…
posted by mhoye at 5:00 PM on July 15, 2023 [4 favorites]


I really want to know if honesty researchers feel unclean after faking their data. Maybe Ariely and Gino could volunteer for a focus group?
posted by hydropsyche at 5:28 PM on July 15, 2023 [7 favorites]


coolname, I remember that too. Something about manuscripts? Incunabula? The faker was big into making up institutes to be lauded by, and went after someone who “just had a website” but was a widely known independent scholar.

Still can’t remember the subject!
posted by clew at 5:32 PM on July 15, 2023


There was also Michael LaCour - not sure if that's who people are trying to remember. But that's the recent scandal I thought of when I saw this news.
posted by coffeecat at 5:39 PM on July 15, 2023


clew, coolname: I remember this story too, and it's like I have all the big brush strokes but can't remember a single details. It was all very international, and I think the faker might have been Swiss?
posted by hippybear at 5:49 PM on July 15, 2023


Or if she wasn't Swiss, she was getting Swiss money to do her thing that was fraudulent. So strange I can't remember it. Surely someone does.
posted by hippybear at 5:50 PM on July 15, 2023


Am I the only one who thinks the weirdest part about this is that they do their data analysis in excel?
posted by If only I had a penguin... at 6:02 PM on July 15, 2023 [1 favorite]


coolname, clew, and hippybear: you folks are thinking of Carla Rossi and her "institute" RECEPTIO.
posted by RichardP at 6:10 PM on July 15, 2023 [22 favorites]


Bless you RichardP. I now have hope of maybe sleeping tonight.
posted by hippybear at 6:22 PM on July 15, 2023


Based on what I know of folks at my university, it seems that business school folks use Excel, social sciences folks (and the occasional other science people who are really uncomfortable with computers) are still using, like, MiniTab or something (i.e. a real stats analysis program, but older and less powerful), and folks in the sciences plus the occasional social scientist who also knows programming or is otherwise unusually adept with computers use R? But yeah, our business school makes their students buy and use Excel in their business classes.
posted by eviemath at 6:49 PM on July 15, 2023 [8 favorites]


Though not nearly enough of any of those people use version control — much less on their data.

J&J is suing scientists for published research, AIUI, it’s all going to get even more adversarial.
posted by clew at 7:34 PM on July 15, 2023 [1 favorite]


I (social scientist) tend to clean up original data sets in excel, as .csv files, then import/export as needed. Analyses are in other programs though.
posted by bizzyb at 8:42 PM on July 15, 2023 [2 favorites]


I've talked to friends in academia about 'open science' approaches to data fidelity with stuff like R and the answer is always that it's 'hard' or 'not practical'. And I totally believe them. These aren't total 10s, but these are well intentioned folks.

(1) If people are going to work with data they need some sort of professional standard. We're in XXI, get your shit together. (This is really a plea to universities to fund support for scholars, not a further squeeze on tenure seeking stressed out faculty)

(2) Please fix the academic funding/publishing model. It may even be more poisonous in the long term than the horror show that is the US healthcare system. And that should scare all of us.
posted by Reasonably Everything Happens at 9:28 PM on July 15, 2023 [5 favorites]


A combination of the following will matter for how an academic analyzes their data:

1. What you were trained in (and sometimes what you're students are being trained in)
2. What your institution supports
3. What your field uses, recognizes, and is capable of reviewing.
4. How easy and transparent it is to do your most common analyses in that software.

The thing about R code (or Stata .do files, or matlab .m files or SPSS .sps files, etc.) is that not everyone can parse it meaningfully or run it themselves.

In contrast, one of the massive benefits of excel is its network effects, and the nearly complete absence of training needed to get up and running immediately. Almost all institutions support excel (including companies, which is very useful if you are getting real-world data ). Nearly every stats program has excel read/write functions, and if you're in the tiny minority who doesn't have a license, google sheets has your back. Missing data in excel is represented by blank cells which are ignored, rather than placeholder values which can throw up errors when analyses are applied. Excel graphs and tables can be added into sheets in the same workbook and paste nice professional scalable vector graphics into powerpoint and Adobe Illustrator when you want to do a back of the envelope visualization or create figures and slides. So an excel file is highly likely to be readable *in an interpretable way* without any other code or software by a wide audience. To bizzyb's point, it's also very easy to clean data sets from tools like qualtrics in excel because you can see right away whether your actions did the thing you wanted them to. This compares favorably to running code blindly and causing unforced errors in a data file you never actually looked at, which let me assure you, happens an unpleasant amount of the time.

Is excel a good stats program? No - and plenty of stupid stuff happens. (I'm looking at you, auto-date formatting.)

Is excel a de facto lingua franca? Kinda, yeah. Having it around also helps with open science and communication between different fields who benefit from knowing each others' findings.
posted by BlueBlueElectricBlue at 11:07 PM on July 15, 2023 [18 favorites]


The forced conclusions in these papers are so lame and reflect such a simplistic psychological model too. Sooo lame. Fire her. Also, fascinating info about how Excel files are organized.
posted by DeepSeaHaggis at 1:19 AM on July 16, 2023 [1 favorite]


A++ thread title
posted by straight at 3:01 AM on July 16, 2023 [6 favorites]


On the other hand, LibreOffice has most of the same advantages as Excel and won't fuck with your dates. (Which is to say: oh yeah, I check things in both programs all the time, because they and Google Sheets are much easier to navigate with a quick visual check of my data than anything else I'm aware of.) I'm constantly writing little conditional formatting macros to let me see pieces of the data and understand which bits need more tweaking and which don't. I honestly think the dataset visualization tools of these spreadsheet programs are almost unparalleled, and for people who are tempted to cheat and manually alter their data it's certainly the easiest and quickest way to do that, especially systematically.

I transitioned into a Psych dept a few years ago for my postdoc and the thing everyone defaulted to when I came in was Prism. I can flat say, I would rather work with Excel one thousand times more than Prism or SPSS. At least I don't have to hand copy and paste data into narrow tables in order to quickly calculate things or move values around in Excel, you know?
posted by sciatrix at 3:53 AM on July 16, 2023 [3 favorites]


In contrast, one of the massive benefits of excel is its network effects, and the nearly complete absence of training needed to get up and running immediately.

I’ve said this plenty of times, but: Excel is not a spreadsheet. Excel is a full-featured virtual machine running a Smalltalk-inspired REPL whose display layer happens to resemble a spreadsheet. It comes at a cost, sure, but in terms of accessibility, easy of utility and community size, nothing else in computing comes close.

Something like a third of the world’s money goes through Excel every single day, and the reason you don’t think Excel is a Real Programming Language is because if we admitted that, we’d have to admit that most of the most important software in the world is written by underpaid women in pink collar jobs who don’t even think of themselves as programmers, and we can’t have that.
posted by mhoye at 4:57 AM on July 16, 2023 [42 favorites]


mhoye On the one hand, I think you have a solid point.

On the other hand I think a lot of the Excel isn't a real programming language talk is the result of how utterly, unspeakably, shitty it is as a programming language. And that's before you get into introducing VB scripts to fuck things up even more.

Also, so damn many people who "use" Excel have no clue what they're doing and in my experience working tech support for the past 20 years often don't even know that stuff as basic as the SUM function exists. I have had more than one person call me for help with their spreadsheet because they were tired of highlighting the cells they wanted to sum, making a note of the number displayed in the autosum in the bottom right corner of the screen, and then manually entering that into a cell.

Usually these people also complained that they had a really tough time highlighting all the cells they needed to work with because pulling the mouse down through many screens went so fast that they kept overshooting what they wanted to highlight then painstakingly move the mouse up tiny tiny increments to get to where they really did want to end.

And often, when I tried to show them that they could shift+click to select a range more easily they told me that was too complicated to remember so they'd keep doing it their way thanks all the same.

I will note that the worst offenders of that nature were mostly middle aged cis het white dudes like me, not women of any age.
posted by sotonohito at 6:08 AM on July 16, 2023 [5 favorites]


Thank you, metadata!
posted by rmd1023 at 6:41 AM on July 16, 2023 [1 favorite]


All data passes through Excel. Its dominance is so absolute that we have to consider Excel when naming genes.
posted by betaray at 6:43 AM on July 16, 2023 [5 favorites]


Also, yes, large portions of many companies - large and small alike - are like XKCD strip about dependencies except the little supporting piece is "terrifyingly janky excel script that has been passed down and ever-so-terrifyingly-modified for decades and nobody dares touch it too much"
posted by rmd1023 at 6:44 AM on July 16, 2023 [2 favorites]


You can use excel for data analysis if your statistical models aren't too fancy, and most of the time in social science research they aren't unless you have like, some huge dataset for some reason. If you collected the data yourself, it'll probably be a small enough sample size that you would have to really think hard about whether it can support anything fancier than summary statistics and a little regression on it. IMO.
posted by subdee at 7:00 AM on July 16, 2023 [1 favorite]


Reasonably Everything Happens, I think this is a question of reproducible science before being one of open science. The benefits of reproducibility under the current incentive system should accrue more to the original researchers, without making the system worse off, yesno? Incremental steps.

—-
Excel is so bad at statistics that NIST won’t review it as a stats package. Excel is so bad at parsing that William Kahan found a class of equivalent *integer calculations * for which Excel returns different results. The money runs through Excel — the money is not being looked after by scrupulous caretakers, we know that! The vast array of low-status workers keeping everything running with elderly scripts deserve a better tool.
posted by clew at 7:16 AM on July 16, 2023 [2 favorites]


You *can*, but it's painful. Even doing a little regression in excel is like bathing in broken glass compared to "reg y x1 x2 x3" or "summary(lm(y~x1+x2+X3))". I occasionally teach undergrad intro statistics and I'd rather onboard everyone kicking and screaming onto R, for most of them the first time using command-line software, than deal with supervising regression in excel.
posted by GCU Sweet and Full of Grace at 7:18 AM on July 16, 2023 [4 favorites]


The exec summary video is great, but do folks really think "Academia is BROKEN?" I ask because I have a non-science lawyer friend who is extremely skeptical of published research and leans towards the opinion that the system is broken. I've argued that the system works imperfectly but trends toward good - and I'd argue in this case it eventually worked, but not without some damage. Is peer reviewing the thing that is broken? Digging into the YT comments I saw "I do lots of peer review. I have never been given a dataset to accompany an article." And other comments about how flawed and superficial peer review is. Curious if others have experience with that process.
posted by blendor at 7:26 AM on July 16, 2023


do folks really think "Academia is BROKEN?"

I don't think this is a binary thing, but Gino is an HBS professor who has published books and many papers and was involved in a falsified paper a decade ago and is just now getting some consequences. This alone would seem to indicate at least partially broken in my opinion.
posted by snofoam at 8:37 AM on July 16, 2023 [3 favorites]


I think this varies a lot by field. Requiring open data sets and R code for publication has become pretty standard in the ecology literature over the past few years. The psychology literature especially has taken a lot of hits over the past decade and committed to a number of reforms. Although Ariely and Gino do social science research and have published some in psychology journals, they are business school faculty and seem to publish primarily in business journals. I would imagine that field could have a reckoning coming as well, but I also have worked in academia long enough to know that the business schools get passes on things that the rest of us could never get away with.
posted by hydropsyche at 8:41 AM on July 16, 2023 [9 favorites]


blendor: . . . how flawed and superficial peer review is. Curious if others have experience with that process.
IAAPR, IANYPR. I made a comment on the responsibilities of peer-reviewers in the 2021 thread about the 2012 Ariely+Gino paper. tl;dr to do this unpaid task (which counts nothing towards your promotion prospects) properly requires a couple of days work. "properly" does not, in my field, include a total re-analysis of the data, if available, because that would be a couple of weeks of pro bono work.
posted by BobTheScientist at 10:31 AM on July 16, 2023 [7 favorites]


hydropsyche: but I also have worked in academia long enough to know that the business schools get passes on things that the rest of us could never get away with.

Academia as a microcosm of society.
posted by clawsoon at 2:55 PM on July 16, 2023 [3 favorites]


Hydropsyche, the most egregious paper identified in this mess (e.g. double fraud, Gino+Ariely) was published in PNAS, so I'm not convinced that this is a business journal problem.

Also, Data Colada was started by, and is currently run by Leif Nelson (Marketing professor, Haas School of Business), Joe Simmons (OID professor, Wharton School of Business) and Uri Simonsohn (Behavioral Science professor, ESADE Business School.)

The Wharton School's Credibility Lab also created AsPredicted and Research Box as simple accessible tools for pre-registration of social science experiments and surveys, and structured repositories for code, data, and experimental instruments respectively. Both tools are widely in use in behavioral science.

None of this excuses all the bad practices that business schools had and have. But it raises the question of whether part of the reason we're hearing about a lot of problems related to business academics is because that field started cleaning their own house.
posted by BlueBlueElectricBlue at 3:56 PM on July 16, 2023 [2 favorites]


I had forgotten that one was in PNAS. That journal has its own unique badness where National Academy members' papers are not subject to the same scrutiny as others'. They have worked to reform some of that, eliminating the thing where Academy members could also submit their friends' and mentees' papers for back door review, but it is still an issue.

In any case, I hope that every journal (and university and school within a university) in the world is having a reckoning.
posted by hydropsyche at 5:08 PM on July 16, 2023 [1 favorite]


You can use excel for data analysis if your statistical models aren't too fancy, and most of the time in social science research they aren't unless you have like, some huge dataset for some reason. If you collected the data yourself, it'll probably be a small enough sample size that you would have to really think hard about whether it can support anything fancier than summary statistics and a little regression on it. IMO.

so technically my data set that I'm currently working on was collected by... well, mostly by a first year grad student and a couple of techs over the course of a year or so, but I could totally do the same thing tomorrow if I wanted. we are in a psych dept and the dataset is simple video of mice in an open field after varying injections of amphetamine. extremely, extremely low tech. each video is about 90 minutes, five or six weeks of video, 33 mice total.

I have twenty terabytes of raw video, easily, and that translates into additional terabytes of models used to generate pose estimation data on exactly where one of ten points of the mouse are in every. single. frame.

now, I'm an outlier in that I brought our lab to "uhhh two of the top five power users of google drive in the university are sciatrix, it turns out" (comment via contact in IT). but even before I came in, when it was "just" traditional systems-neuroscience/animal-cognition type work with touch screens in operant boxes for mice, there was easily enough raw data to make it extremely worth running linear mixed models. anyone doing game theory or mechanical-turk style research or probability or decisionmaking or anything of that nature can extremely easily rack up enough behavioral data to make that stuff worth it.

now like, there is plenty of excellent social sciences work that doesn't dig into that level of sheer Giant Data! But I can think of half a dozen social sciences approaches that can generate that level of data, especially working with something like qualtrics. and regrettably my experience is that granularity of assay and level of "mathiness" and sheer scale of numbers all play a lot harder towards whether a given assay is taken seriously than, like, the strength of the methodological design and the specificity of the hypothesis assessment.

all that aside I totally did calculate baby's first ever paper analysis, a plain jane anova with my undergraduate thesis data, manually using Excel with no formulas more complex than fractions or multiplication. absolutely possible. my supervisor was floored and then she laughed for like ten minutes and showed me how to use JMP. (disgusting. never again. would honestly rather code in excel any day.)
posted by sciatrix at 10:11 AM on July 17, 2023 [4 favorites]


Requiring open data sets and R code for publication has become pretty standard in the ecology literature over the past few years.

also noting that this has happened as a consequence of some extremely public and embarrassing recent incidents in ecology (Pruitt, Danielle Dixon, I was going to mention the BethAnn McLaughlin incident but technically that was neuroscience with an anthropologist sock...) that I think has spurred a lot of field-cleaning. Maybe I'm just being twitchy because I'm within one degree of people directly involved with both the Pruitt and McLaughlin cases, but I feel like this is only becoming standard in this way because data forensics keeps revealing fraudsters and fields keep publicly grappling with the consequences.
posted by sciatrix at 10:21 AM on July 17, 2023 [2 favorites]


Let’s all take a step back and remember that business schools are part of academia in the sense of an engorged tick feeding on the host while also spreading MBA to the west of the world. And before the inevitable #notallmbas - yes. Yours too.

Am I the only one who thinks the weirdest part about this is that they do their data analysis in excel?
As mentioned, all data passes through excel. When all you have is a hammer. . .
posted by aspersioncast at 4:38 AM on July 19, 2023 [1 favorite]


The latest on this is that Gino has filed a $25 million defamation suit against Harvard and Data Colada, alleging that they've "destroy[ed her] career and reputation despite admitting they have no evidence proving their allegations," as Gino announced in a brief LinkedIn post, which also has a lot of funny supporting comments (although quite a few sterner ones have also arrived since I last looked). The Chronicle has a long story by a reporter who has been covering Gino for months, but I don't know how to bypass the paywall.

It looks very bad to me. I can't rule out that Harvard did something wrong in their closed proceedings, and if they did then they should be held to account. But the Data Colada authors are getting dragged into court for publishing analyses that are completely transparent about their evidential support. Including them in the suit seems indefensible to me. Using the law to attack scientific critics makes the world worse whether or not the critics are right, because it will tend to shield powerful lies from the public examination needed to expose them. It also looks like a confession of guilt. If the critics are wrong, then show us the evidence. If you could, you seemingly wouldn't need to resort to legal warfare.

Obviously, whether her work is fraudulent or not, Gino will want to defend her exalted position against these kinds of public challenges. She's been drawing >$1mm salary from Harvard, charging five-figure corporate speaking fees, getting celebrated in the media, and generally getting to present herself as a wise sage and even a good person.

I think it's wrong to peddle bullshit in order to get rich, but sometimes it can kinda feel like a bit of a victimless crime. Idk, maybe nobody takes it all that serious, or maybe the most direct victims of the grift kind of deserve it, or whatever. But it's evil when you come out in defense of your bullshit by trying to destroy people who would tell the truth.
posted by grobstein at 3:45 PM on August 4, 2023 [2 favorites]


It looks like her main maneuver is going to be blaming any misconduct on her subordinates, a gambit that has previously worked well for other prominent academics including Harvard's own Doris Kearns Goodwin.
posted by grobstein at 3:52 PM on August 4, 2023


The Crimson pulled out one more interesting thing from the 100-page law suit:
[HBS Dean] Datar told Gino that ... he requested the University president ... that Harvard began proceedings to revoke Gino’s tenure.

No known instance exists of a Harvard professor’s tenure being revoked.
Purely on the merits Gino looks guilty as sin, and an embarrassing fraud of this scale seems big enough to justify all of the penalties Harvard has considered. But her lawyers are probably right that other professors have done similarly bad stuff and been dealt with more leniently, so maybe her discrimination claims could have legs (idk if that's how it works but it does seem to be one of her main legal theories).

My initial reaction to the suit was, like, (to Gino) Are you sure you want the fraud allegations against you exhaustively examined in court? But on reflection, she may be calculating that Harvard won't want to have other professors' misconduct dragged out into open court, putting her in an advantageous position for a settlement.
posted by grobstein at 5:27 PM on August 7, 2023 [1 favorite]


« Older "For me, being an artist means being in community...   |   The addition of black makes the white whiter. Newer »


This thread has been archived and is closed to new comments