Sometimes it’s easier to rewrite genetics than update Excel
August 6, 2020 2:27 PM   Subscribe

Despite years (previously) of warnings (previously) to not use Excel for bioinformatics, it seems that it is a habit which cannot be broken. So, instead, over the past year or so, some 27 human genes have been renamed, all because Microsoft Excel kept misreading their symbols as dates.
posted by clawsoon (132 comments total) 50 users marked this as a favorite
 
The way Excel handles dates is a goddamn travesty and cannot believe they still haven't changed it. I don't care what excuses they come up with, your program should not arbitrarily convert my data to another type of totally different data!
posted by showbiz_liz at 2:32 PM on August 6 [78 favorites]


I'm on the side of the great renaming, actually. Excel is a piece of garbage but, like democracy, it's better than the alternatives due to its ubiquity. The date thing is an affordance that is invaluable in many, many other situations. If the translation was a switch that could be turned off, inevitably someone would forget to turn it off at some point while the errors would overall be less frequent they'd also be harder to spot just because they were less frequent.

I guess if I was really invested in the problem I'd rail about why some gene names have to look like dates when they're purely arbitrary strings. Just pick some other damn characters and get over it.
posted by seanmpuckett at 2:37 PM on August 6 [9 favorites]


A colleague and I were JUST BITCHING ABOUT EXCEL DATE NOTATION not 3 hours ago, and now this. I sent her the link:
That is insane. Excel wins over years of genetics!
*cries in VBA*
posted by phunniemee at 2:37 PM on August 6 [11 favorites]


I suspect some of our resident bioinformaticians are going to show up soon and this is probably going to be a hell of a popcorn fest.
posted by seanmpuckett at 2:39 PM on August 6 [4 favorites]


your program should not arbitrarily convert my data to another type of totally different data!

Arguably it's not, you entered a date, so it treats it like a date. Forcing people to jump through hoops to enter dates is arguably a much worse product decision.

The victim-blaming-excel-nerd in me points out that putting a simple double quote (") at the beginning of a cell fixes this problem, as does being very careful in doing CSV imports, but no one is that careful and it'll still happen all the time so I suspect this is the only solution.

Also, yes, no one should use Excel for anything, except that it's 90% of the way to being the best tool for nearly everything, so yeah, people will keep using it for advanced accounting, bioinformatics, a database, a 2D video game engine, a 3D game engine, a ray tracer and probably everything else ever.
posted by GuyZero at 2:39 PM on August 6 [17 favorites]


Finally a chance to post this meme.
posted by Superilla at 2:39 PM on August 6 [129 favorites]


Excel misapplied parentheses in simple algebra for years - still does, possibly. Highlight of a class with William Kahan.

If Excel were free in any sense its ubiquity might defray its treachery, but as it is, it’s cargo cult scientism.
posted by clew at 2:42 PM on August 6 [8 favorites]


Just format all text columns as "Text" and this problem goes away. Excel is great for lots of things but is certainly not purpose-built for bioinformatics, and why isn't there dedicated bioinformatics software that people are using instead?
posted by grumpybear69 at 2:42 PM on August 6 [8 favorites]


Is setting the data types of columns not among the first few things people learn when they learn Excel?

I suppose you could argue that it shouldn't auto-detect data types based on what you put in the cells, but then again, I'd bet that for 95% of usage cases that's something that saves people time.
posted by pipeski at 2:43 PM on August 6 [1 favorite]


Excel 1, Science 0
posted by They sucked his brains out! at 2:44 PM on August 6 [4 favorites]


I worked with a PhD physicist on a gov consulting gig. He was *smart*. He wrote Excel customizations for modelling ballistic missile reentry.

At my current gig, we have Trello, jira, and TFS on different teams. 90% of PMs make us report progress in Excel. Ugh.
posted by j_curiouser at 2:46 PM on August 6


why isn't there dedicated bioinformatics software that people are using instead?

There is! But it's a pain in the ass for lab folks to use, though, so Excel keeps going on, like a zombie that leaves dead bodies and Microsoft line endings everywhere.
posted by They sucked his brains out! at 2:46 PM on August 6 [9 favorites]


Also, yes, no one should use Excel for anything, except that it's 90% of the way to being the best tool for nearly everything, so yeah, people will keep using it for advanced accounting, bioinformatics, a database, a 2D video game engine, a 3D game engine, a ray tracer and probably everything else ever.
Don't forget texting Nelly
posted by eckeric at 2:48 PM on August 6 [17 favorites]


The LEGO UI design thread from the other day might be useful for this discussion. You always have to take into account humans who are tired, rushed, or being paid less than minimum wage and can't be bothered. If you don't, you get mistakes.
posted by clawsoon at 2:49 PM on August 6 [2 favorites]


Also in the frustrating even if not data ruining context, an Excel Pivot Table can interpret dates where Excel does not. I have a dataset of US airports; if I sort the data manually, the three letter airport codes are in the correct order. If I create a Pivot Table, the first airport codes in order will be:
SUN - Sun Valley, ID
SAT - San Antonio, TX
JAN - Jackson/Vicksburg, MS
AUG - Augusta, ME
DEC - Decatur, IL -- before:
ABE - Allentown/Bethlehem/Easton, PA
ABI - Abilene, TX
and so on. If I then cut and paste values from this table, the codes sort correctly again. It's maddening.
posted by Superilla at 2:51 PM on August 6 [35 favorites]


Don't forget texting Nelly

Technically the PEN/GEOS 3.0 Spreadsheet app.
posted by GuyZero at 2:52 PM on August 6 [4 favorites]


*Sighs and nods exhaustedly in datawranglerese*

It doesn't matter if you set the field to text, it doesn't stick in many cases, and if it changes out of your line of sight and you save you may be fucked forever. If the end result of your manipulation requires dropping the file out to csv, it's hard to even see the kinds of ways Excel may gleefully fuck you - unless you look at it in a text editor - until the subsequent csv import creates all kinds of frankenshit data in your destination.

Excel has made me cry more in my career than sexism, mean clients, or server patches combined.
posted by Lyn Never at 2:53 PM on August 6 [67 favorites]


people will keep using [Excel] for advanced accounting, bioinformatics, a database, a 2D video game engine, a 3D game engine, a ray tracer yt and probably everything else ever.

Sending a screen shot to support... 🙄
posted by Greg_Ace at 3:01 PM on August 6 [5 favorites]


Arguably it's not, you entered a date, so it treats it like a date. Forcing people to jump through hoops to enter dates is arguably a much worse product decision.

When I am God-Emperor,* everyone, everywhere should absolutely be required to enter all dates as YYYY-MM-DD to have them recognised by spreadsheets as dates. Because otherwise, you get people typing March 15 when they mean 2020-03-15, and it converts the date to 2015-03-01. You can't misenter a date if you're using full ISO.

You can even enter Old Style dates in a modified ISO: 1677/8-01-08 is January 8, 1677 (old style, which comes AFTER April 1, 1677). Excel or any other program gets confused, of course, but I forced all the dates to be plain text anyways and they would still sort alphabetically with everything in the right order.

*This post is part of the evidence for why no one will ever let me be God-Emperor
posted by jb at 3:01 PM on August 6 [45 favorites]


The real absurdity is that there is not a simple check box to turn off all automatic data formatting when importing CSV/TSV/TXT files. It wouldn't break backwards compatibility, and it would surely be easy to implement. And yet!
posted by jedicus at 3:02 PM on August 6 [30 favorites]


Excel

a database


Okay, that's it. You have to leave the building, turn around, curse, and spit. Only then can you come back in.
posted by Mr. Bad Example at 3:04 PM on August 6 [22 favorites]


is regularly used by scientists to track their work and even conduct clinical trials.

ABJURE
posted by Going To Maine at 3:04 PM on August 6 [1 favorite]


*This post is part of the evidence for why no one will ever let me be God-Emperor

DATES FOR THE DATE GOD
posted by GuyZero at 3:05 PM on August 6 [24 favorites]


EXCEL IS NOT A DATABASE
posted by GallonOfAlan at 3:07 PM on August 6 [25 favorites]


What, you people want me to use Access? Why on earth would I want to do that?
posted by Huffy Puffy at 3:12 PM on August 6 [17 favorites]


I'm going to blow your minds by saying that yes, Excel is a perfectly good small non-SQL database.

oh wait, my bad, you can totally run SQL queries against an Excel table.
posted by GuyZero at 3:14 PM on August 6 [14 favorites]


I used to laugh at these stories, up until the point I wanted to paste into excel a big list of hex numbers that happened to include several entries where all the digits were 1-9 with exactly 1 E tossed in.
posted by ckape at 3:16 PM on August 6 [14 favorites]


When you're fed up with using Excel as a database, you can use Jira as a general database. Avoids all that getting permissions from your DBAs.
posted by mdoar at 3:26 PM on August 6 [7 favorites]


Hey, jb. I'd vote for you. Mandate all times expressed as 24 hour format as well, and I'll bear your children.
posted by angiep at 3:30 PM on August 6 [14 favorites]


Speaking as someone working in science in their PhD: It gets used because you already know it. Everyone knows it. It does basic math with small numbers just fine. It can make graphs REALLY quickly, and everyone already has a copy.

So, for the calculations that I do......twice a year, I could either: Write a python program to run on my data, then graph it in a second program (would take hours, I'm not very good at python), POSSIBLY use Origin or something like it (Would take a lot of time to learn, expensive and even if the lab has a copy my laptop doesn't so I'd have to do it all on the lab computer), or I can just use excel to normalize my fluorescence data to the biggest peak and be done the entire analysis in 20 minutes, export to CSV and graph it in LaTeX.
posted by Canageek at 3:32 PM on August 6 [14 favorites]


Excel 1, Science 0

Excel | 1970-01-02 | Science | 1970-01-01
posted by Joey Buttafoucault at 3:47 PM on August 6 [92 favorites]


There ought to be a scientific/technical version of Excel with more checking and less magic. Still mostly the same interface, but more emphasis on correctness of results, types and so on.
posted by Monday, stony Monday at 3:48 PM on August 6 [5 favorites]


That excel is ubiquitous when there are so many free and better alternatives probably shows that people do, in fact, like excel. A lot of people do data manipulation visually, and tho it causes a lot of problems I think excel serves that purpose for most casual applications. I use it for its conditional formatting features as a spot check.

If most users are frustrated by excel date/time parsing, I don't think R or whatever is going to be the simpler option.
posted by Think_Long at 3:52 PM on August 6 [6 favorites]


I didn't mean to imply that I don't find date/time in excel to be non-infuriating, just that I think date/time formats to be a generally frustrating problem in data work
posted by Think_Long at 3:56 PM on August 6 [4 favorites]


The way Excel handles dates is a goddamn travesty and cannot believe they still haven't changed it. I don't care what excuses they come up with, your program should not arbitrarily convert my data to another type of totally different data!

The trick is to input the data correctly. Snark aside, though...

That excel is ubiquitous when there are so many free and better alternatives probably shows that people do, in fact, like excel

Let’s not go bananas. It sucks. Excel is still around because all the data’s already been in Excel for forty dang years and switching over to some other program with its own quirks and compatibility issues, and which may very well cease to exist in a few months (cough google cough), would be about the dumbest thing you could do.
posted by Sys Rq at 4:01 PM on August 6 [3 favorites]


you can totally run SQL queries against an Excel table.

Don't blame me, COVID made me do it.

Because when we needed to report the COVID status of every patient in all 40 of our wards daily for national statistics, we gave them a spreadsheet each, and I wrote the query to union them all across the [full path to filename]Sheet1!A:V tables and collate them daily until we could get a better data collection system.
posted by ambrosen at 4:04 PM on August 6 [16 favorites]


Date/time formats are indeed a generally frustrating problem in data work. And yet the vast majority of programming environments manage to avoid introducing that specific frustration without being asked to.

And yet the vast majority of programming done on this planet still happens in Excel. I suspect it would be the dominant paradigm even if it did not try to train users to accept "magical" behavior.
posted by Phssthpok at 4:05 PM on August 6


you entered a date, so it treats it like a date

No, I fucking did not enter a date. I entered a string that could, if you stand on your head and squint, conceivably be parsed as something vaguely date-like. Actually, most of the time, I didn’t enter shit. I opened a CSV file that someone else made.

Also, formatting columns as text doesn’t fix the date conversion once it’s already happened. That information is just gone.

And adding quotation marks creates a pain in my butt when I need to import that dataset into R or Python — I have to remember to remove the quotation marks again if I want to do anything else with those text values. Yes, it’s a very simple regexp, but it’s still tedious. For example, I often work with data where something is reported as a range , like “5-10.” Excel sees that as a date, May 10. I may want to parse it myself to get the min and max of the range, splitting the string on “-“ and converting the resulting two characters to numeric. Adding a leading quotation mark to work around Excel makes more work for the parsing I actually want to do.

Excel already has a mechanism where it flags things it sees as “numbers entered as text,” giving you the option to “convert to numeric,” instead of just hlepfully auto-converting for you. 99% of the time I want that conversion to happen, but I have to ask for it, which is fine. Why could Excel not do the same thing with things it sees as possibly “dates entered as text”? There are all kinds of context clues — most other items in the same column or row can’t be parsed as dates; the “date” formatting is unusual (“MARCH1” rather than “March 1”), etc.
posted by snowmentality at 4:05 PM on August 6 [36 favorites]


Excel has tons of bugs that are never fixed. The one where if you copy a cell manually and you paste it quickly a few times, as in CTRL+V then the down arrow to go the next row down, somehow magically takes you to line 1048576 instead of one row down- that has been around for 20 years.
posted by The_Vegetables at 4:19 PM on August 6 [13 favorites]


Yeah. I think some respondents here misunderstand a bit how bonkers it is that if I do nothing but open a .csv file in Excel and save it again as .csv without hitting any other keys, if there are autoparsed dates, the raw data may actually change from whatever text I had ("MARCH1" say) to "days since the date epoch." I don't think I know another program that does this.

Does Excel still do that thing where the date epoch for the Mac version is 4 years behind the date epoch for the Windows version? Charming.

I think in many science fields, teaching basic data skills is someone else's job -- there's no cred for it in the department -- and so it's possible to graduate with a surprisingly slight understanding of tabular data. Some of these flaws don't become apparent until you have a good intuition for what shape you wish the data were in for the purpose of simplifying some other chore. So it saddens but does not surprise me that no other tool has become dominant in this space.
posted by eirias at 4:20 PM on August 6 [27 favorites]


Monday, stony Monday, yes, yes, yes.

It terrifies me that the data science world has moved consistently towards technology that makes it difficult, and game-able, to reproduce results.
posted by runehog at 4:24 PM on August 6 [2 favorites]


Excel will also parse numeric strings and strip leading zeros which makes a mess of things like zip codes.
posted by kokaku at 4:25 PM on August 6 [19 favorites]


You think that's bad, you should see what it did to Symantec Endpoint Protection 11.
posted by darksasami at 4:25 PM on August 6 [3 favorites]


clearly someone needs to do an opensource excel clone with every feature except any date support whatsoever.
posted by GuyZero at 4:25 PM on August 6 [3 favorites]


Excel killed a laptop battery, just to watch it die.
posted by They sucked his brains out! at 4:36 PM on August 6 [10 favorites]


oh wait, my bad, you can totally run SQL queries against an Excel table.

You can freeze a banana and hammer nails in with it, too. It doesn't mean you should.
posted by Mr. Bad Example at 4:45 PM on August 6 [21 favorites]


free and better alternatives

Admittedly, this was many years ago, but OpenOffice always made my machine feel like it was on the verge of crashing. Probably because it was drawing a GUI with some nightmarish stack of Java classes, not with native libraries. I guess some people prefer Google sheets, or whatever they call it now, and it does have some nice features, but I don't know if I'd really agree that it is "better" than Excel, which is really a very good product imho. The date conversion thing is inexcusable though, and the type inference in general is not it's finest hour.
posted by thelonius at 4:51 PM on August 6 [2 favorites]


A health billing system I worked with would only export to xls. Whenever a set of icd9 codes comes out to all numerics, excel helpfully strips the (meaningful) trailing and leading zeros. The handling to avoid this took weeks of my life.
posted by a robot made out of meat at 4:54 PM on August 6 [15 favorites]


Occasionally I will feel compelled to relive my Lotus 1-2-3 spread-head days in DOS and will fiddle around with the venerable and amusingly quirky sc...
posted by jim in austin at 4:58 PM on August 6 [3 favorites]


can they just rename the genes to the date they resolve to in excel?
posted by logicpunk at 4:58 PM on August 6 [14 favorites]


people will keep using it for advanced accounting, bioinformatics, a database, a 2D video game engine, a 3D game engine, a ray tracer and probably everything else ever.

Last year I needed to do a quick time series analysis of a low pass filter circuit, and I didn't have access to a SPICE seat.

So I did a quick brain-refresh of my differential equations class from some 30 years ago, and made a numerical integration model of the circuit using Excel.

It was actually a fun little project. :)
posted by ZenMasterThis at 4:59 PM on August 6 [5 favorites]


@thelonius In my experience, Google Sheets is not useable for any data sets that are bigger than minuscule. If you set aside the potential privacy and ethical issues of using a Google product, it's a nice tool but it just doesn't scale at all.
posted by ElKevbo at 5:00 PM on August 6 [4 favorites]


A health billing system I worked with would only export to xls. Whenever a set of icd9 codes comes out to all numerics, excel helpfully strips the (meaningful) trailing and leading zeros. The handling to avoid this took weeks of my life.

That's the kind of thing I'm talking about. I also found out that it looks at only the first 25, or maybe 32, rows to decide what type each column is. So, if you imported a .csv file, and the first 25 of your codes were all numeric, but later there were ones with text characters, it wouldn't infer that the column was text; I think it would just format cells that it couldn't treat as numeric that way. And, as you say, it wouldn't retain leading zeros.

There was a "Joel On Software" blog post, long long ago, about when they did a study of Excel users at Microsoft. They thought that everyone used Excel to do projections and conditional scenarios with financial data, but it turned out that almost all users mostly used it to format reports. Because there wasn't any other digital graph paper with varying grids out there, really, and that is a thing that people like. And despite that study, they evidently retained its paradigm that you are working with numbers unless you specify otherwise.
posted by thelonius at 5:01 PM on August 6 [7 favorites]


My first ever AskMe post, 14 years ago, was "how do I stop Excel making everything into a fucking date?!", so I can sympathise.
posted by EndsOfInvention at 5:03 PM on August 6 [54 favorites]


One of the three difficult tasks in computing: Naming things, and off-by-one errors.
posted by RobotVoodooPower at 5:21 PM on August 6 [12 favorites]


Just in case anyone's not aware of this...

1. Open Excel
2. File > New > Blank
3. In the ribbon click Data > From text file
4. Select the CSV file
5. Step through the import wizard...
6. On the 2nd or 3rd screen click Ctrl+A to select all columns, and choose Text formatting (or select the columns to format as text)
posted by Greg_Ace at 5:23 PM on August 6 [11 favorites]


Excel misapplied parentheses in simple algebra for years - still does, possibly. Highlight of a class with William Kahan.

What did it do?
posted by jeather at 5:23 PM on August 6 [1 favorite]


Pandas.
That is all.
posted by signal at 5:26 PM on August 6 [6 favorites]


Sending a screen shot to support...

Yesterday I got an Excel spreadsheet which contained only a screenshot of another Excel spreadsheet, admirably matched for column and row size...I work in support.
posted by not_that_epiphanius at 5:41 PM on August 6 [33 favorites]


1. Open Excel
2. File > New > Blank
3. In the ribbon click Data > From text file


This the right answer, but when Excel is installed, MS switches the file icons for .csv files to little Excel icons, making the user think it's fine to open them by double clicking.
posted by paper chromatographologist at 5:41 PM on August 6 [11 favorites]


Oh my god the zip code thing was the bane of my existence for a few weeks. My company was switching to a new ticketing system that only accepted CSVs for importing data and because csv's don't store metadata like column formatting I'd have to go through the same hoops every time I opened the file to put the leading 0's back
posted by JDHarper at 5:47 PM on August 6 [3 favorites]


I wonder if there's someone at Microsoft who is going home tonight, lying in their bed, staring at the ceiling, being washed over with waves of shame.
posted by clawsoon at 5:47 PM on August 6 [10 favorites]


The real absurdity is that there is not a simple check box to turn off all automatic data formatting when importing CSV/TSV/TXT files. It wouldn't break backwards compatibility, and it would surely be easy to implement. And yet!

As somebody who recently started needing to edit tab-delimited files several times a day at my job.... YES.

Excel is one of those programs where I'm familiar enough with it to assume that there's a pretty simple way to do pretty much anything I can think of needing to do in it, I just need to look up how. But to not be able to just set a default for turning off all automatic formatting when importing csv/txt files adds so many tedious clicks to my already tedious clicky job.
I'm sure there's a macro for it.
posted by bananana at 6:00 PM on August 6 [2 favorites]


'Everybody knows it' is an optimistic way of looking at this problem. 'Everybody knows' is Leonard Cohen's most depressing cynical song for a reason. 'Everybody' 'knows' certain uses of Excel (and Word) to accomplish workplace tasks poorly, but very few people are actually skilled in the use of the spreadsheet/word processors, or even know the functional difference. Which is why I deal in my day job with relentless Excel spreadsheets made up of purely text matter, which are matrices of check-list update matter between people on a project, and Word documents that consist of gruesome irregular tables of numbered boxes, bandaged together with evil font size fixes and the kind of margin adjustments that should carry health warnings. I don't think I've ever seen a spreadsheet in my working life that's ever had an actual calculation in it. Sort? What kind of weird voodoo is that? It's only because the capacity exists to make the information complex that the list information is complex. 'Oh, I'm an engineer, I use Excel, not Word like some kind of primitive with an Arts degree! Colour the boxes to indicate the update status! This spreadsheet of project items that needs closing out has to be sent to twenty different people and saved as matrix_draft_FINAL-v3.xlsx~update.xlsx. Haven't you seen that email chain?' Fuck you! They should all be lists.
posted by Fiasco da Gama at 6:07 PM on August 6 [16 favorites]


MetaFilter: a goddamn travesty and cannot believe they still haven't changed it.
posted by GenjiandProust at 6:07 PM on August 6 [13 favorites]


I had to write a tool for work to fix data exported from one of our actual databases (legacy software written in Visual FoxPro; don't get me started on that part of this shitshow) which is ostensibly in "excel" format. It's a museum collection database and there are a whole bunch of object IDs which look like dates or scientific notation to Excel. And also it's not actually exporting XLS or CSV data; it's writing it as an HTML table, which is apparently something Excel can import. But not when it's producing invalid HTML if any of the text fields in your database contain a "<" character.

I don't know which I hate more: that database's (still broken, despite a verbose bug report to the developers and several releases since then) "export to Excel" feature, or Excel itself. And I don't even use that database. I can only imagine what a pain in the ass it must have been for its users before.
posted by hades at 6:11 PM on August 6 [1 favorite]


Colour the boxes to indicate the update status! This spreadsheet of project items that needs closing out has to be sent to twenty different people and saved as matrix_draft_FINAL-v3.xlsx~update.xlsx.

Ah, I see you've met the variant of the office work species known as "Project Manager", sub-variant "Alphabet Soup".
posted by mrgoat at 6:20 PM on August 6 [4 favorites]


MetaFilter: a goddamn travesty and cannot believe they still haven't changed it.

Clippy says, “Looks like you want to start a MetaTalk thread!”
posted by curious nu at 6:21 PM on August 6 [34 favorites]


Also as someone that has to work with an unmaintained Access setup let me say there are a lot of days I’d prefer Excel.
posted by curious nu at 6:22 PM on August 6 [4 favorites]


What is impressive about Excel's formatting of dates, to me, is that it generally screws up actual dates on import. It turns all sorts of other stuff into dates, but dates into nothing at all.
posted by jeather at 6:26 PM on August 6 [20 favorites]


If most users are frustrated by excel date/time parsing, I don't think R or whatever is going to be the simpler option.

R with lubridate makes working with datetimes an absolute dream. Simply applying ymd_hms() to a column, no farting about with lt or ct or POSIX...I am sighing happily just thinking of it.

I realise this doesn't necessarily contradict your point but RStudio is such a good IDE and basic R is really so easy to learn, I swear.

I hate excel so much. I don't think I can even make a sensible chart anymore.

posted by kalimac at 6:27 PM on August 6 [8 favorites]


I don't have much to add except that I've been disappointed for my entire professional life at how otherwise smart scientists have been swindled into thinking Microsoft office products are reasonable tools to use for scientific research or communications.

For years, I thought the problem might get better as more digital natives showed up to play. People who know alternatives and have less invested in the monopoly. But the bullshit seems more accepted now than ever, and I say: you all deserve what you get, you should have known better. In fact I think you did and do know better, but you just collectively don't want to rock the shitty software boat you're in.
posted by SaltySalticid at 6:30 PM on August 6 [8 favorites]


Other software costs extra, and nobody knows how to use it.

Also as someone that has to work with an unmaintained Access setup let me say there are a lot of days I’d prefer Excel.

“The database was set up 15 years ago, and the guy who set it up quit 12 years ago. IT won’t support it, so that’s why we have to run Access 2003.”
posted by Huffy Puffy at 6:44 PM on August 6 [17 favorites]


SaltySalticid: For years, I thought the problem might get better as more digital natives showed up to play. People who know alternatives and have less invested in the monopoly.

The company I work for hired a couple of digital natives recently. Apparently they had only ever used phones, so they had to be taught how to cut-and-paste on a computer. Ctrl-C Ctrl-V was new to them.

Not a big deal - easy enough to teach, and companies shouldn't expect to have all their vocational training done for them by somebody else - but I found it fascinating. A lot of things that we were expecting 20 years ago from the next generation of digital natives have turned out much differently than we expected.
posted by clawsoon at 6:54 PM on August 6 [35 favorites]


Excel is not a database, but if you insist on using it in that capacity, PowerQuery is a slight improvement over excel copy and paste, affording you the ability to specify the datatype you are importing.
posted by Nanukthedog at 6:57 PM on August 6 [1 favorite]


A lot of things that we were expecting 20 years ago from the next generation of digital natives have turned out much differently than we expected.

"Digital native" does not automatically mean "power user" and arguably never has, but even more strongly so in recent years as devices become easier for novices to do simple things with.
posted by Greg_Ace at 7:21 PM on August 6 [5 favorites]


Digital Native = "knows how to post things on Tik Tok", basically.
posted by signal at 7:25 PM on August 6 [4 favorites]


This very topic is also being extensively discussed on the ycombinator.com Hacker News site...
posted by jim in austin at 7:39 PM on August 6 [1 favorite]


Yesterday I got an Excel spreadsheet which contained only a screenshot of another Excel spreadsheet, admirably matched for column and row size...I work in support.

I know how I'm going to drive my boss nuts next week.
posted by Gorgik at 8:14 PM on August 6 [23 favorites]


I work at Microsoft on Power Query and my personal opinion (which doesn’t drive the product, at least in this case) is that if you’re opening data files directly with Excel then you‘re doing it wrong (TM). The better thing to do is to treat your source data as immutable and to load it into a spreadsheet via some repeatable process and to do any cleanup as part of that process rather than manually. Not coincidentally, this is what Power Query does.

The fact that Excel installs a handler for .csv files is unfortunate but obviously very convenient for some scenarios . I wish they would change this behavior to at least ask what you intend to do with the data you’re looking at and perhaps give some explicit choices based on the answer.
posted by Slothrup at 8:43 PM on August 6 [7 favorites]


> I don't have much to add except that I've been disappointed for my entire professional life at how otherwise smart scientists have been swindled into thinking Microsoft office products are reasonable tools to use for scientific research or communications.

I know you can’t see this but I’m favoriting this as hard as I possibly can. Somehow Microsoft inherited the “Nobody ever got fired for buying IBM” crown. We’re well into the 21st century with CRISPR and shit but the people actually doing that wizardry are using Excel because the IT department won’t put R Studio (or similar) on their machine!
posted by Monochrome at 9:16 PM on August 6 [7 favorites]


As Your Friendly, Neighborhood Databaseperson™, I implore everyone: Please, for the love of God, don't use Excel as a database. Thank you.
posted by ob1quixote at 9:26 PM on August 6 [8 favorites]


It’s a poor engineer who blames their tools. These scientists just suck at excel because they were born without the =SUM(B1:B16) gene.
posted by panama joe at 9:27 PM on August 6 [15 favorites]


why isn't there dedicated bioinformatics software that people are using instead?

Casual usability isn't one of the strong points of academically-produced scientific computing platforms such as R or pandas. Those languages/libraries can do amazing things but the barrier to entry is extraordinarily high for someone who isn't a coder. There may some commercial software out there that makes things easier but it probably costs a lot of money and Excel comes almost for free.
posted by treepour at 10:07 PM on August 6 [5 favorites]


I hope this is fixed properly at least: I wonder what Excel will do with MARS1 when regional settings are set to French.
posted by Blorg at 10:46 PM on August 6 [3 favorites]


> Finally a chance to post this meme.

One of the world's best Venn Diagrams gets even better. :P
posted by kliuless at 11:02 PM on August 6 [12 favorites]


Dev 1: "God, I'm so sick of exchanging data in error-prone formats like CSV or Excel, these compatibility issues are giving me a migraine."
Dev 2: "Yeah, we should use something clear and straightforward like JSON, instead!"
Dev 1: "JSON? Tell me more! What types of data can you express with that?"
Dev 2: "Oh, anything you'd need! True/false..."
Dev 1: "Ugh, I bet we have to support different capitalization for that, don't we."
Dev 2: "Nope! The standard is clear that it has to be lowercase!"
Dev 1: "Well that's a relief. What else?"
Dev 2: "Numbers, obviously."
Dev 1: "Oh, cool, that's easy enough to parse. Positive and negative decimals, I assume?"
Dev 2: "Yeah... and scientific notation of course."
Dev 1: "Err, of course. Is that with the lowercase 'e' in there?"
Dev 2: "That one's actually not case sensitive."
Dev 1: "Um. Okay, what precision does it support?"
Dev 2: "Whatever you want! Or actually it's whatever is supported by the software reading it, I guess."
Dev 1: "Hm. Well all our software uses 64-bit IEEE 754 floating point representation. Can JSON use that?"
Dev 2: "Approximately! I mean, it doesn't support all the invalid NaN representations..."
Dev 1: "Yeah, that's fair..."
Dev 2: "...or any NaN representation..."
Dev 1: "...I guess that's fair too..."
Dev 2: "...or infinity..."
Dev 1: "...this deal is getting worse all the time."
Dev 2: "But you can have negative zero! Though it'll probably get ignored by the parser."
Dev 1: "Bleh, okay, what else can it do?"
Dev 2: "There's support for lists."
Dev 1: "Great, how do I represent a list of numbers?"
Dev 2: "Well, it's always a list of anything. You can't enforce that it's a list of just numbers. Anyway, there's also key/value mappings."
Dev 1: "Oh, good. I have this structure that maps integers to the list of their prime factors."
Dev 2: "Hm, well actually the keys can only be strings. The values can be anything though, just like with lists!"
Dev 1: <sighs> "Okay, well there's strings, at least."
Dev 2: "Double-quoted text strings, for sure!"
Dev 1: "And if I need to include a double-quote?"
Dev 2: "Just put a backslash before it, as usual. You can do that for all the necessary escapes. Double-quotes, backslash..."
Dev 1: "Excellent."
Dev 2: "...forward slash..."
Dev 1: "Huh? Why? Does a forward slash have special meaning in the string?"
Dev 2: "No, why would it? Anyway, there's also '\t' to mean a tab. You can include them verbatim but the escape helps so you won't confuse it with multiple spaces if you're looking at it visually."
Dev 1: "Oh, that's useful!"
Dev 2: "There's '\n' and '\r' for line feed/carriage return."
Dev 1: "Okay, but if I have multi-line text can I just have those verbatim too?"
Dev 2: "No, it'll just be one single-line string with those escapes in it."
Dev 1: "But I thought it was supposed to be useful to read visually?"
Dev 2: "...there's also all the other shorthand escapes you use all the time, like '\b' for backspace and '\f' for form feed..."
Dev 1: <rolls eyes> "Oh, all the time, for sure. Gotta make those convenient. What about escaping everything else?"
Dev 2: "You can use '\u' and a hex Unicode codepoint."
Dev 1: "Case insensitive again, like scientific notation?"
Dev 2: "Er, no, this one's case sensitive."
Dev 1: "Okay. So it's the..."
Dev 2: "...wait, the 'u' is case sensitive, the hex characters can be upper or lowercase."
Dev 1: "Um, okay. So it's '\u' followed by the Unicode codepoint? That's easy to look up, at least."
Dev 2: "Well strictly speaking it's only the codepoints in the Basic Multilingual Plane. If you need anything above FFFF you have to use a UTF-16 surrogate pair."
Dev 1: "Oh, JSON has to be sent in UTF-16? That's awkward, UTF-8 would be much more efficient."
Dev 2: "No, the JSON document can be UTF-8, but if you want to escape something beyond the BMP inside your strings, it has to be the two escaped surrogate pair codepoints as if it were UTF-16."
Dev 1: <rubs eyes tiredly> "Fine, can I at least trust that the surrogate pairs are going to match up in valid JSON?"
Dev 2: "You can trust anything you want, I guess."
Dev 1: "Criminy. Okay, how about dates, how do you represent those?"
Dev 2: "In a string!"
Dev 1: "Huh? It's not a separate notation? How do you represent it in the string?"
Dev 2: "I dunno, however you normally would in a CSV I suppose. That's how you'd represent anything other than what I mentioned."
Dev 1: "...but that's what I was complaining about in the first pla-"
Dev 2: "Oh, but there's also 'null'. For nulls, you see."
Dev 1: <screaming internally>

posted by Riki tiki at 11:05 PM on August 6 [27 favorites]


@Think_long

"That excel is ubiquitous when there are so many free and better alternatives "

There are? Where?

Don't say OpenOffice. If it were better everyone would be using it.
posted by GallonOfAlan at 12:50 AM on August 7 [2 favorites]


Yes, that has always happened with technologies - the most popular option is the one with the best features, best design choices and most technical merit.
posted by each day we work at 1:19 AM on August 7 [7 favorites]


I use excel extensively in my job, which is primarily mathematical. For actuaries, in addition to specific actuarial/programming software, Excel is the primary data analysis tool.

For my daily work, I'm pretty happy with excel, though the date formatting is maddening sometimes. Otherwise, it works very well as a spreadsheet application, so I can do just about everything I need to, assuming the dataset is not too big. I would definitely consider myself a power user, not to the extent that I could create a game, but I could answer just about any excel question that comes my way.

There's one clusterfuck of a project that is done twice a year, that could be slightly less of a clusterfuck (but still a clusterfuck) if it was done with something other than Excel. However, when I was reviewing and improving the process for this project, it was emphasized to me that it has to be excel because it cannot be guaranteed that the next person working on the project would be familiar enough with whatever other program to be able to use it. So I'm stuck with excel because that's what everyone knows.

Side note: if you need true randomness for whatever reason DO NOT USE EXCEL, it is not really random if you want true randomness (and why would you want anything less?)
posted by LizBoBiz at 1:56 AM on August 7


true pseudo-randomness is the best that computers can do, isn't it?
posted by thelonius at 3:22 AM on August 7 [1 favorite]


Apparently they can do true randomness, just not with algorithms. I'm a big fan of Random.org
posted by LizBoBiz at 4:55 AM on August 7


I don’t know if it’s changed again since, but in 2007 Excel was changed to use a Mersenne Twister for pseudorandomness. Before that, yeah, it was apparently really bad.
posted by Slothrup at 5:38 AM on August 7 [1 favorite]


I'm a big fan of Random.org

I gave up at the fourth captcha screen. Guess I better stick with my RAND Corporation .
posted by thelonius at 5:51 AM on August 7 [2 favorites]


true pseudo-randomness is the best that computers can do, isn't it?

Yes and no. To do true randomness, computers need to get some source of random data from "outside". I think random.org uses cosmic ray observations or some kind of radioactive decay event (last I checked, anyway). Home computers have their tricks too, because their hardware exists in an environment that's full of noise. A standard example of a source is time between keypresses; if you measure the time between keystrokes as the user is typing, those times will be fairly consistent, but if you look at the end of the decimal you get a lot of random noise you can harvest. (e.g., you measure 0.125386194 seconds between two keypresses and keep the "6194" part, because the 0.125 may be predictable but the 6194 is not).

Operating systems keep a pool of these random digits sitting around for when they need true randomness, such as for encryption. (I actually first learned all of this when I was updating a linux system and it needed to update the keyring for the package manager. Somehow I had depleted the entropy pool, and the update process kept stalling when it got to the crypto step, with no indication of why. After some googling I read about all of this, and the next time I tried to update, I started hitting keys at random once it stalled -- and, as if by magic, it started again.)

it is not really random if you want true randomness (and why would you want anything less?)

For most uses, true random and pseudo-random are mostly indistinguishable, but there are times to prefer each. Cryptography is the big one where you really want true randomness. On the other hand, pseudo-random generators allow you to set the seed and thereby get reproducible results, which is useful in scientific and teaching applications (and, occasionally, in video games).
posted by egregious theorem at 6:30 AM on August 7 [4 favorites]


What's better than excel depends on what you're doing. Part of the problem is that so many people are using spreadsheets for things spreadsheets are not designed for.

Sometimes, for scientific applications, it would be better to use a .csv file and R, sometimes you'd be better off using SQL or a even a good text editor. It's all about the nature of the problem, but unless scientists are doing literal budget sheets for their grants, excel is almost always the wrong tool, but they still reach for it because they feel locked in or nobody taught them about all the better ways. I don't believe for one second that people on the cutting edge of genetics research just aren't bright enough to learn how to use software beyond MS office.

The fact that people are dissing 'open office' indicates they aren't keeping up with it; it's been the LibreOffice fork/project for almost 10 years now. And sure it sucks, but it doesn't suck any more than MS office, and I'm not paying someone to suck. The idea that someone balks because 'other software costs money' is sadly hilarious, and another huge win of marketing by MS-- people actually think MS software is free, and alternative are costly when it's precisely the other way around.
posted by SaltySalticid at 6:47 AM on August 7 [4 favorites]


I saw a slightly-funny Venn diagram which someone had posted on Facebook last week. The circles were separately labelled "Excel" and "Incel", and the overlapping region was labelled "Can't tell if something is a date."
posted by JimDe at 6:54 AM on August 7 [9 favorites]


*double-clicks CSV to open it in Excel to change a mistyped account description*

*Save as CSV*

*software that imports the CSV* : eight thousand error lines like "8.63E+11 is not a valid account number"

Excel is useful in so many ways, but maddening in an equal number of ways
posted by AzraelBrown at 7:01 AM on August 7 [3 favorites]


The “other software costs extra” factor is also why no company I worked for ever let me use MS Project for the first 15 years of my career. About 1% (or less) of my salary, but let’s not spend that money. So here we go again with Excel (or, God forbid, Lotus Notes).
posted by Huffy Puffy at 7:10 AM on August 7 [3 favorites]


It's the Swiss Army Spoon of getting something done on a computer.
posted by clawsoon at 7:16 AM on August 7 [6 favorites]


About 10-Jan of the time this autoformatting is helpful, but I'm united with the team science in feeling like 10-Sept of the time it's just a pain in the rear.
posted by Theophrastus Johnson at 7:43 AM on August 7 [19 favorites]


Excel is maddening in so many ways but I'll nthing the remarks from others that scientists (and many others I'm sure) are using Excel for the wrong kinds of things. Databases especially are terrible in Excel. But I do sympathise with the issues because sometimes Excel is all you got.

I've spent so much of my adult working life using various versions of Excel that the numerous "bugs as features" situations (the dates are one problem, the truncated numbers AzraelBrown mentions, rounding errors and so many others) I've found that I have internalised these problems to such an extent that when I've used other software that actually works, Stockholm syndrome like, I thought the other software was broken. I long ago gave up hope on Microsoft "fixing" some of these so I just learned to work around them - sometimes an easy fix, sometimes time consuming. So most of my rage has abated as I've gotten older (and have largely switched to LibreOffice).

However, I do have one pet peeve associated with Excel that is ongoing... A big part of my job is taking product sheets from various vendors and massaging the data into a CSV that can be inputted into our system so my customers can order product. So a very common problem for me is that I'll get a spreadsheet and I will see cutting and pasting errors, date errors, truncated numbers, bad characters or misread characters (intentional and unintentional), and on and on that the spreadsheet creator at my vendors didn't double check when they made the spreadsheet because they weren't as aware of these "bugs as features" in whatever version of Excel they were using. Some of these spreadsheets are relatively large, say 10000 to 45000 lines, so the errors can compound over time and can be complicated to extricate and made to make sense. I usually catch these errors before I input them into my system but sometimes I don't catch them until a customer sends me a confused email. I spend many hours dealing with this every month...
posted by Ashwagandha at 7:47 AM on August 7 [2 favorites]


Also, yes, no one should use Excel for anything, except that it's 90% of the way to being the best tool for nearly everything

One of the most famous exceptions, of course, being building a Turing machine, for which PowerPoint is the better tool.
posted by solotoro at 8:37 AM on August 7 [2 favorites]


I mainly used Quattro Pro when I was doing research at a university years ago. My current employer was using Quattro when I started here, but switched to Excel. The switch was maddening, especially the date thing. So many things that I did all the time became infuriating difficult. Same with switching from Word Perfect to MS Word, which also has a bad habit of changing things and being very frustrating to format. But nothing is as painful as when I had to edit and mark-up PDFs in Adobe Acrobat. UGH.
posted by fimbulvetr at 8:45 AM on August 7


"Depleting the entropy pool" sounds like a euphemism of some sort.
posted by Greg_Ace at 8:49 AM on August 7 [4 favorites]


Managing remote student work for our mid-size college department (merging survey data vs addresses vs registration status etc) in Excel was going to be such a nightmare that I've literally spent the last 5 days learning pandas instead.
posted by range at 9:02 AM on August 7 [5 favorites]


If Excel (and other Office apps) are supposed to be "helpful," then fine, Microsoft should make them helpful. Part of that would be recognizing that when you have a shitload of data, the data in a column is probably all of the same type, and if you have a few outliers that look like they might be dates, maybe they're not dates and you could at least ask. You could maintain the same internal representation (which, AIUI, Excel does not) and flag where you think a certain data type might apply.

As it is, it's like having a five-year-old kid in the kitchen "helping" you bake a cake. I guess that's fine if your own cake-baking skills are at the level of a five-year-old…
posted by adamrice at 9:41 AM on August 7 [9 favorites]


"Depleting the entropy pool" sounds like a euphemism of some sort.

Pretty sure it's a Culture Ship.
posted by EndsOfInvention at 10:41 AM on August 7 [5 favorites]


I actually first learned all of this when I was updating a linux system and it needed to update the keyring for the package manager. Somehow I had depleted the entropy pool, and the update process kept stalling when it got to the crypto step, with no indication of why. After some googling I read about all of this, and the next time I tried to update, I started hitting keys at random once it stalled -- and, as if by magic, it started again.

Your package manager should have been using /dev/urandom instead of /dev/random. Probably.

...pseudo-random generators allow you to set the seed and thereby get reproducible results, which is useful in scientific and teaching applications (and, occasionally, in video games).

There's also the classic(?) trick in Roguelike games of saving a level by saving only the randomizer seed used to generate it and then just recreating the level when needed.
posted by suetanvil at 11:02 AM on August 7 [3 favorites]


It’s weird that there is not an option to turn this off. Having worked on Microsoft Office (for the Mac) I can say that the Microsoft culture is very pro adding features.
When people find out you work on Office everybody complains that there are too many features, and then asks for three more. Most features people asked me to add were already present in Office, but users hadn’t found them.

That’s how their products end up with problems you can’t fix by adding features, like poor ease of use and general bloat.

Adding the hypothetical “Don’t detect dates” feature, is easy. If I still had the source to Excel I could do it in no time. It’s probably a good idea to add, but every added pref causes new problems for some users. For example, a user on a shared account turns it on, and then another user complains that dates are broken. Every pref has to be discovered to be useful, and the more there are the harder it gets.
posted by w0mbat at 11:04 AM on August 7 [6 favorites]


Why are people using it for editing CSV files? CSV is a text format. Use a text editor.
posted by GallonOfAlan at 11:25 AM on August 7 [2 favorites]


A thing I've noticed when I've used Microsoft products is that they have a tendency to

a) take initiative and "helpfully" do things for you automatically without being asked to

b) incorrectly.

My suspicion is that deep within the marketing culture is this idea that people really want the computer to Do The Right Thing Automatically to the point where they're willing to tolerate a lot more error than most of their users are.

Back when I first heard about this problem years ago, I had (as I am prone to) an idea for a product. It would be a spreadsheet for scientists. In addition to treating your data as sacrosanct, it would treat the spreadsheet as a HPC programming language. Your formulas would be heavily optimized and if desired runnable on whichever cluster of computers you had available, the interface would be designed to let you deal with truly enormous quantities of data if you needed to, and so on. It would be a niche product with a high per-seat price tag, sort of like the quant-focused Excel plugin I was working on at the time.

And then I read an article about why scientists use Excel. The reason is that Excel is generally free and they usually don't have the budget for anything else. And my dollar signs developed little wings and flew away.

Although I suppose that the wealth of good FOSS science software available these days are probably better for everyone involved, it's probably just as we..
posted by suetanvil at 11:27 AM on August 7 [2 favorites]


GallonOfAlan: Why are people using it for editing CSV files? CSV is a text format. Use a text editor.

If you have data across a number of columns and headers at the top, it can be extremely difficult to know if you're editing the correct column when you have a CSV open in a text editor and none of the columns line up.
posted by clawsoon at 12:08 PM on August 7 [5 favorites]


If you have data across a number of columns and headers at the top, it can be extremely difficult to know if you're editing the correct column when you have a CSV open in a text editor and none of the columns line up.

This is completely true and has often been my use case, but I would also suggest looking up something like the RainbowCSV add-on for Visual Studio Code, or the equivalent for your own power text editor of choice. This has its own problems, but it it makes columns perceivable.
posted by Going To Maine at 12:34 PM on August 7 [1 favorite]


Last I checked (a few years ago), the data mangling could be turned off per-document / template, but not systemwide. This would all be solved if there was a GPO setting that you could broadcast to the whole org that turned off the "feature", but of course there is not. You can go into contortions to turn off the data mangling for yourself, but if you collaborate with others the likelihood of data corruption approaches 1 very fast.
posted by benzenedream at 12:40 PM on August 7


Why are people using it for editing CSV files? CSV is a text format. Use a text editor.

Yeah no, I'm not going to fill down, sort and look for anomalies, or do a quick formula* in a text editor. Text editor's fine for most search and replace, but that's about it.

*Higher functions sure, I'll take it all the way back to xlsx to do additional stuff to it, but if I've got a file that's almost ready to import except a couple tasks, I'm just opening the csv in Excel.
posted by Lyn Never at 12:56 PM on August 7 [3 favorites]


A while ago I heard someone observe that any given person only used about 5% of the capabilities of programs like Excel or Word. The problem of course is that no two people use the same 5% and so we have this bloated mess where each and every feature is deemed essential by someone. Backward compatibility, where no existing feature ever gets removed or improved, worsens this bloat and sometimes means that there are multiple ways to accomplish the same task. (And no, thats not a great idea in PERL either.)

The real problem with Microsoft Office derives from its ubiquity. People never look for alternatives or a better solution. But worse, much worse, is that no one is trained in the use of these packages. I have never encountered a business which used style sheets in word. And I have never met a business user of Excel who had any grasp of the rudiments, relative vs absolute addressing for instance. Named ranges or cells. Pivot tables are thought of as rocket science but any 12 year-old could be taught pivot tables in an hour. None are taught basic debugging techniques let alone, god forbid, testing. Excel is the world's most widely used programming system and essentially none of the 'programmers' know what they are doing. It is like a world full of car drivers who are self taught (or worse, taught by their dad).
posted by epo at 1:04 PM on August 7 [6 favorites]


The automatic date recognition feature almost certainly stems from a desire to cater for the legion of the untrained who comprise Office's userbase. By making things easier for them you provide tripwires for people with more complex requirements. Every convenience feature makes life more difficult for those with needs which are away from the mainstream.
posted by epo at 1:17 PM on August 7 [1 favorite]


Excel's main users are accountants, not scientists. They do not give a shit about scientists and engineers using their products. As long as the CFO thinks Excel is the bee's knees they are fine.
posted by benzenedream at 1:32 PM on August 7 [1 favorite]


> I didn't mean to imply that I don't find date/time in excel to be non-infuriating

Date/time calculations are one of the first exercises given to beginning programming students and also one of the thorniest and almost insoluble issues in programming and databases.

Still, the way Excel handles auto-date formatting is a travesty.
  • It wouldn't be hard at all to make all this auto-formatting optional. Just give us a way to turn it off.
  • Any kind of auto-interpretation/auto-formatting like this is problematic. But make it into a nice function or button--that you can press when you want it and refrain from pressing the entire rest of the time--and suddenly it's a genius feature.
Like, I select a column of data I just entered, I click the "Auto-format as date" button, they all turn into nicely formatted dates.

This isn't hard and would be a great feature.

You're importing a .csv file. A little dialogue pops up: "Some of the data in Column X look like dates. Would you like to auto-important them as dates? (Y/N)"

No, thank you, Excel. I wouldn't.

But thanks for asking, because maybe one time out of ten I sure would and you just saved me some trouble in that minor edge case!

This isn't really hard at all. Except for the fact that the behavior is programmed into Excel at a very deep level--who knows how many thousands of lines it would require changing to make it work in a sensible way--and on top of that there are millions of lines out outside code that depend on keeping Excel's current functionality exactly as it is.

So the genetics people are definitely taking the smart way out by just changing their procedures to bow down to the reality of the Unchanging One.
posted by flug at 1:49 PM on August 7 [2 favorites]


Despite this particularly frustrating absurdity (and don't get me started on how it handles ISBNs) I firmly believe Excel is the only truly functional Microsoft product, warts and all. All of the truly persistent problems I've had with it had a lot more to do with someone else not knowing how the fuck to use the program, than any truly unsolvable quirk of Excel itself.
posted by aspersioncast at 2:10 PM on August 7 [2 favorites]


Fortran is still going strong in astrophysics, so we may never be able to get rid of Excel in bioinformatics, but that doesn't mean we shouldn't try. There's a good number of hosted iPython notebook providers, some of which are free. For work where Pandas is appropriate, there's no excuse.

The crime here is CSV as the interchange format, and Microsoft is the guilty party. In my workflows, something more advanced than CSV that was able to specify "HEY THIS COLUMN IS NOT A DATE DON'T EVEN TRY, EXCEL" would sidestep the issue of having to guess. Some say that Microsoft has changed, and it's clear some parts have, but it's clear other parts haven't. Several months ago, back in 2019, a friends work email went down due to a "reply-all" storm. The IT guy was off on vacation and couldn't fix it until they got back in.
posted by fragmede at 2:16 PM on August 7 [1 favorite]


This would indeed be trivial to stop this behavior but then some non-trivial amount of existing use cases that depend on the original behavior. It cannot be overestimated how much resistance there is to breaking backwards compatibility in Excel. Case in point, there is still Excel functions that incorrectly calculates the year 1900 as a leap year because Lotus 1-2-3 made that mistake. Lotus 1-2-3 was released in 1983.

I've recently taught myself some pandas and it does some strange things around auto-determining null or null-like values. Of course if you are using pandas you rightfully can handle those cases in code.

Digression: I've often thought that the automated test suites for Excel to handle all these weird edge cases must be staggeringly large and full of things that would seem bananas to most people.
posted by mmascolino at 2:45 PM on August 7 [5 favorites]


Okay, most of the posts here are missing the point completely. The problem is mostly not about computational biologists trying to use Excel to do computational biology. The main problem is that computational biologists have to work with experimental biologists and scientific journals, and much to our dismay, Excel remains the standard in those worlds. The vast majority of experimental biologists are not coders. They have at best the vaguest possible idea what RStudio, pandas, or SQL are. They are very busy doing actual experimental biology and if you try to make them learn the tidyverse just to interact with you, they will find someone else to work with. (Also, as a side note, the instruments they use are also not unlikely to be running some terrible piece of closed-source enterprise software for Windows 98 that spits out the raw data in... an .xlsx file.)

Likewise, if you are going to submit something to a journal, in most cases you cannot upload your 20 supplemental tables with the raw data that was too big to fit in the actual manuscript as individual .csvs. Most journals will simply not accept that many files for upload, if they even accept a .csv at all to begin with. The most-commonly accepted format for supplementary information remains an .xlsx with multiple sheets. This is actually where I suspect the vast majority of these errors originate, not in the process of doing actual bioinformatics. I'd also expect it doesn't get caught precisely because most of the people doing this conversion are doing it as a one-off and do not regularly use Excel, and would not suspect that the extremely simple task of opening a few .csvs, pasting them into a single document, and saving them to exhibit such frankly deranged and dangerous default behavior.

If your proposed solution is that nobody should ever use Excel, then great, that's a very nice principled stand, best of luck finding collaborators and publishers who are willing to work with you anyway. For the rest of us this is an unbelievably stupid but necessary solution to a unbelievably stupid but real problem.
posted by en forme de poire at 3:27 PM on August 7 [21 favorites]


British Army does not Excel at spreadsheets: Soldiers' newly announced promotions are revoked after sorting snafu

Red faces abound within the British Army after an Excel spreadsheet cock-up led to a number of soldiers being wrongly promoted.

A screenshot of an internal email sent around the Royal Logistics Corps (RLC) and posted on Facebook revealed the snafu, which The Register has been told happened after someone in the RLC personnel management branch incorrectly sorted a spreadsheet of successful and unsuccessful promotion candidates.


As all Excel jokes go:

They shall grow not old, as we that are left grow old:
Age shall not weary them, nor the years condemn.
At the going down of the sun and in the morning,
We will remember them.

posted by They sucked his brains out! at 3:35 PM on August 7


When I was a grad student, I went to my major professor and said "Dr. X says I should learn statistics". My major professor said "You don't need statistics; you need to learn a second language. That will some in handy when you're reading scientific papers." So I learned a second language.

And that, friends, is why I am using Excel today.
posted by acrasis at 6:51 PM on August 7 [2 favorites]


If you have to use Excel, do yourself a favor and learn how to use PowerQuery. It will keep your source data intact, and you can specify every column’s data type (and do other neat transformations too). It has a really easy syntax that you can learn in a couple hours and Microsoft’s online documentation for it has nice, clear examples. Also people will think you’re a some kind of data wizard. It’s great fun.*

I use lots of tools at work, including SQL and C#, and they all have their pros and cons but most of the time Excel is just *faster*.


*ok, I might be a bit weird
posted by Doleful Creature at 12:54 AM on August 8


So many comments and nobody has brought up the principle of least surprise. When software surprises the user (in a bad way), it should generally be assumed to be the fault of the software. Don't blame the user for failing to follow the software's expectations, instead fix the software to match the user's expectations.

That said, I don't hold any hope that Excel will change. It's been doing this long enough that surprise is now exasperation, and every change breaks someone's workflow.
posted by swr at 2:06 AM on August 8 [6 favorites]


But this IS the principle of least surprise. Microsoft Office is an appliance. Most users will not, and do not want to, think about how it works.

Most Excel users are data handling drones who want to double click an email attachment, print out the result and get on with their facebooking. And yes, anything that requires extra steps or thought is a break in the workflow. It is the ones complaining who have the edge cases.
posted by epo at 2:41 AM on August 8


I don't think users would be surprised by an application storing the data the way they entered it. They might be surprised when they sort a date column and it sorts alphanumerically rather than by date, but there are solutions to that that don't involve irreversibly changing the user's data.

I also don't think these edge cases are as edgy as you suggest. Excel is very aggressive in trying to force things to be a date. I've seen many people frustrated by this. In the UK the dropping of leading zeros is a very common problem because phone numbers here all start with a zero. From what I've seen most people just accept it and expect whoever's looking to know that there's supposed to be a zero there. Getting Excel to not mangle is surprisingly difficult, because even one slip up and it's too late, the data is mangled and once mangled no amount of fiddling with data types or formats can ever unmangle it.

If Excel were a physical appliance it would've been recalled as defective. (Mind you that's true of most software. The whole "appliance" metaphor has generally been aspirational rather than descriptive.)
posted by swr at 3:24 AM on August 8 [2 favorites]


It must have become aggressive in forcing things to be dates because that must be what users demanded, "that's how we write dates". I agree about being a defective appliance and perhaps the root cause is that the teams who originally implemented this simply didn't think it through. Excel used to be niche and became mainstream, that change was never reflected in its behaviours (because backward compatibility).

Excel could be improved in these areas but it would require users to think about what they were doing, that won't happen and Microsoft know it..
posted by epo at 3:47 AM on August 8


I guess if I was really invested in the problem I'd rail about why some gene names have to look like dates when they're purely arbitrary strings. Just pick some other damn characters and get over it.

I work with bacteria, which have gene naming styles that generally don't hit this particular issue, and so I admit I would care less if Excel's meddling was limited to gene names. But Excel has a very broad interpretation of what looks like a date. It is unhelpful to find that deep in some huge RNA-Seq dataset, one of my genes of interest was ostensibly up-regulated by FEB-27.

A lot of people do data manipulation visually, and tho it causes a lot of problems I think excel serves that purpose for most casual applications. I use it for its conditional formatting features as a spot check.

Yeah. I actually do a reasonable amount of my data-parsing in R. But I'm naturally inclined towards interacting with my data more visually, and it can feel more intuitive to be able to see everything (often mediated by conditional formatting). Sometimes there're also things I need to do quickly, and if I know how to do it in Excel and would have to spend time poking around in stackoverflow to get it to work with ggplot2, I do sometimes take the option that is quicker in the short term, if not better in the long term.

There are other factors. Excel's provided by many universities for free, unlike Igor or Origin or GraphPad (which are definitely better than Excel for dealing with many types of scientific data, and which are less intimidating than R for people with little programming experience). Between unwillingness or inability to pay for those tools, the learning curves of those tools and of free tools like R, and fact that your collaborators and/or journals may well (mis)use Excel even if you don't, Excel remains an annoying default that no one can ignore entirely.

clearly someone needs to do an opensource excel clone with every feature except any date support whatsoever.

I mean, you joke, and yet, as someone who has never once used dates in Excel...

Given the decade of complaints in this Microsoft support thread - including a 2011 mention of the gene issue - I don't think there's any danger of Excel competing by introducing a "don't autoformat things as dates" option.
posted by ASF Tod und Schwerkraft at 1:59 PM on August 8 [1 favorite]


« Older The Fire This Time Festival - 7 Short Plays   |   Ex-King Juan Carlos leaves Spain Newer »


This thread has been archived and is closed to new comments