The Backdoor To The Entire Internet That Didn't Happen
April 15, 2024 10:58 AM   Subscribe

A rather large drama unfolded a couple of weeks ago when it was discovered that someone had installed a backdoor into an installation utility used by much of the Open Source community. Backdoor found in widely used Linux utility targets encrypted SSH connections [Ars Technica] This was found by accident, a worker was maintaining his own code and found discrepancies in computer performance and investigated. How one volunteer stopped a backdoor from exposing Linux systems worldwide [The Verge] This seems to have been largely the work of one online account that spent years gaining trust in the group that maintain this tool. THE OTHER PLAYERS WHO HELPED (ALMOST) MAKE THE WORLD’S BIGGEST BACKDOOR HACK [The Intercept] The Mystery of ‘Jia Tan,’ the XZ Backdoor Mastermind [WIRED] Today, Fedora announced its own systems all clear of this thwarted backdoor attempt. CVE-2024-3094: All Clear
posted by hippybear (53 comments total) 32 users marked this as a favorite
 


Question: what are the odds that any LLM trained on this code?
posted by praemunire at 11:21 AM on April 15 [9 favorites]


Schneier's comment on this. He speculates that, given the patience and effort to set everything up, a nation-state actor being behind the scheme is credible. (Seems reasonable to me, although it's so cheap relative to payoff an ambitious and patient criminal organization could do it too.)

This has seemed like a plausible line of attack on open source for years, given code is much more fun to write than review. (Obviously proprietary software has its own backdoor risks.) Kudos to the MS employee who discovered this during their free time.
posted by mark k at 11:28 AM on April 15 [18 favorites]


what are the odds that any LLM trained on this code?

Near zero. The payload was incredibly well hidden. The worst of it was precompiled x86_64 object code that was hidden in a binary blob in the source repo.
posted by 1970s Antihero at 11:29 AM on April 15 [19 favorites]


I'm going with Schneier and the nation/state card here because, man, they did it by messing with autoconf and... nobody understands autoconf.

Rachel has a good take on that.
posted by JoeZydeco at 11:31 AM on April 15 [18 favorites]


I wonder how many other Jia Tans are out there helping with projects. It would make sense for a country's cyber warfare division to have lots of seemingly trustworthy contributors who can suddenly flip to naughty mode when Fearless Leader says it's time.
posted by pracowity at 11:58 AM on April 15 [11 favorites]


Well, it's not just software we have to worry about.
posted by JoeZydeco at 12:17 PM on April 15 [2 favorites]


The worst of it was precompiled x86_64 object code that was hidden in a binary blob in the source repo.

Yes, that makes perfect sense! Not...
posted by y2karl at 12:25 PM on April 15 [1 favorite]


It is basically this xkcd cartoon bit with an unknown state actor trying to whack out that tiny block at the bottom.
posted by Artw at 12:27 PM on April 15 [11 favorites]


I assume thé NSA is doing this too, only better implemented to make detection even less likely.
posted by Quinbus Flestrin at 12:43 PM on April 15 [2 favorites]


Yes, that makes perfect sense! Not...

You can always check out the FAQ on the xz-utils backdoor if I haven't done a good enough job of making it not make sense to you.
posted by 1970s Antihero at 12:49 PM on April 15 [4 favorites]


Question: what are the odds that any LLM trained on this code?

Sort of zero based on what 1970s Antihero said, because the telltale bits of it - anything likely to raise notice - lived in the precompiled, non-human-readable files. LLMs for programming autocomplete like Github Copilot are training on human-readable source, which in this case only had the mininum framework needed to load the offending bits. But even if it had lived entirely in human-readable source, attack vectors like this are so incredibly context-specific that it wouldn’t present any kind of broader systemic threat. Worst case would be Copilot’s output is 0.000…0001% more likely to break. Which depending on how you feel about LLMs taking over the boilerplate aspects of programming might not constitute a worst case.

Flipside, if you had several hundred / a few thousand examples of this kind of attack, each specific to their context and numerous semi-overlapping contexts, then it might - for very low values of might - be possible to train a classifier to flag potential attempts at similar behavior. The initial false positive rate would be stratospheric and a very long, arduous tuning period of RLHF(reinforcement learning from human feedback) would be required to make it useful. But in theory it might be possible to get a little help from LLMs at spotting future clever monkey shenanigans of this sort.

There are so many layers of incredibly complex systems interacting here that this might just be a Turing-level problem, though.

I assume thé NSA is doing this too, only better implemented to make detection even less likely.
posted by Quinbus Flestrin


At least thousands of them, which everyone even remotely adjacent to network security prior to Snowden had long suspected, but he helpfully confirmed was indeed the case. Assume similar from (at minimum) Chinese state intelligence, North Korea, Mossad, UK MI and the weird FSB / Russian mob duolith, scaled to active developer pool.
posted by Ryvar at 1:09 PM on April 15 [6 favorites]


Previously on the blue (March 30th). Quite a lot of discussion and useful links in that thread.

they did it by messing with autoconf and... nobody understands autoconf.

My friend Zack Weinberg and I did some work on autoconf a few years ago, leading to him writing up this analysis in January 2021 concerning (among other things) concerns that the GNU autotools probably needed to address. Zack recently wrote a Fediverse thread about xz and autoconf that partially addresses the "time to scrap autotools for good, everyone should just use cmake/meson/GN/Basel/..." arguments, and discusses the distinction between autoconf and M4:
in my capacity as one of the last few people still keeping autoconf limping along, I'm thinking pretty hard about what could be done to replace its implementation language, and concurrently what could be done to improve development practice for both autoconf and its extensions (the macro archive, gnulib, etc.)....
He has said he's aiming to develop that thread into a blog post and I do look forward to him doing that soon.
posted by brainwane at 1:18 PM on April 15 [15 favorites]


My husband used to be a coder but is now in management, while I'm still on the technical side, and I definitely did a "what do you MEAN you haven't heard about the xz backdoor??? It's all ANYONE is talking about!!!" nerd screech at him a few days after the story broke.
posted by potrzebie at 1:25 PM on April 15 [13 favorites]


Like HOW DO YOU MISS THIS STORY
posted by potrzebie at 1:25 PM on April 15 [3 favorites]


The story did break over a holiday weekend which didn't help.

I was talking to my uncle at Easter dinner and he asked me "so, what's hot right now?" and I told him the xz tale. Said it would probably be all over the news outlets in the upcoming week as "MAJOR HACKER BACKDOOR FOILED AT LAST MOMENT" or something similar, unresearched and overly dramatic.

And then.... nothing.
posted by JoeZydeco at 1:58 PM on April 15 [1 favorite]


I don't know about TV news outlets or print news but I saw stories about it on the front online pages of the NYT, Guardian, Washington Post, etc., not to mention all the tech and sort-of-tech-y magazines and blogs.
posted by trig at 2:06 PM on April 15 [3 favorites]


[Schneier] speculates that, given the patience and effort to set everything up, a nation-state actor being behind the scheme is credible. (Seems reasonable to me, although it's so cheap relative to payoff an ambitious and patient criminal organization could do it too.)

I'm not a huge expert or anything, but my impression of Russian and possibly Israeli nation-state hackery is that plenty of it is outsourced to what I would call criminal organizations. And when one such org gets busted, the remnants reorganize into a new org and get hired again.

My sense is that the NSA keeps their hackery in-house, but I could well be wrong about that.
posted by humbug at 2:15 PM on April 15 [2 favorites]


Andres (the person who discovered the backdoor) was on the Oxide and Friends podcast this week. It's interesting to hear a bit about the process, his mindset and the fears and worries he had. Also nice to hear a genuine expert in something I know little about (ie database development) talking about their expertise, and being comfortable saying what they don't know.
posted by sarcas at 2:16 PM on April 15 [6 favorites]


SuSE has described their methods to show they're clear of the code added by Jia Tun.

What did Open SuSE learn from the xz backdoor?

This has more detail showing that it's almost zero likelihood that the malicious code could be misused to train a Machine Learning model, praemunire and y2karl.
posted by k3ninho at 2:27 PM on April 15


Really quite smart hiding it in a binary blob within the test-suite.

It makes sense that you're going to need binary examples of 'LARGE_SIMPLE_FILE_THAT_WORKS.ZIP' and 'KNOWN_GIBBERISH_THAT_WILL_FAIL.ZIP' to run during the test suites for a compression app. So the gibberish one fails, great, test works-- but when that files is carefully sliced and stacked a different way, by another area it becomes code to serve the malicious purpose. Even slicing the gibberish file doesn't inherently feel wrong, especially if it's like 'gibberish that kinda looks like a compressed file' to test for edge cases.

I can see careful projects moving to a 'never include a binary' and instead you have to create code within the test-suite that in-turn creates the binary itself, so at least you can give it a eyeball or two. But even then, what if you need binary examples from every version of the project to fully test for backwards compatibility, what then? Could leave enough of an gap to squeeze a nation-state actor through.

It might all be vibes, but it does feel like they saw the finish line and did a dash for it at the last minute, rather than just chilling. That's separate to the Microsoft researcher's amazing 'huh' moment, but I do get the feeling they would have been noticed elsewhere in the packaging pipeline.
posted by Static Vagabond at 2:29 PM on April 15 [2 favorites]


It's surprising how little attention this has received. If the backdoor had remained undiscovered for just a few more months, it would probably have been the most devastating software vulnerability that has ever existed. It's hard to overstate just how bad it could have been.

My personal speculation is that the attacker had some kind of deadline. The reason is that 1. they seemed very eager to get the new version of liblzma quickly integrated into the next releases of Ubuntu and Fedora, and 2. the attack was executed somewhat sloppily, which is how it was found in the first place. That half-second SSH login delay didn't need to exist.

Furthermore, almost everyone has assumed that the attacker was acting on behalf of a nation state. It is probably a "rogue state", since this attack, if successful, would have had a negative effect on international commerce. Realistically, the most likely attackers are Russia, Iran, and North Korea. Of those, Russia seems the most likely to have some kind of deadline, keeping in mind that this attack seems to have been in the making since 2021 at the latest, well before the current Middle East crisis. So that's my hypothesis: this was a Russian attack designed to either cause a problem for the West or gain something within a certain time window.
posted by nosewings at 2:33 PM on April 15


It seems like the assumption that it is a large actor behind Jia Tan is based on the fact that it was something that had a couple of years invested in it, building trust across that time? Or is there that assumption because of the sophistication of the attack?

Because honestly there is absolutely zero reason to NOT believe this might have been the action of a single individual who had plans for sudden supervillain level world domination.

I mean, think about the first two seasons of Mr Robot... That was a small number of non-state actors who destroyed the entire credit/debt accounting system for the entire planet plunging it into economic disaster/depression. Yes, that's a fiction story, but it does show that this kind of line of thinking could be in the minds of individuals or small groups and not requiring a state actor behind it.
posted by hippybear at 2:50 PM on April 15


On the Risky Business podcast, they believe it's a nation state actor -- and furthermore, the combination of long timeframe but low work means they probably repeated the same technique across other projects.

That is, a developer could give a few hours per week, among several online identities, totaling up to a full-time job.

Which is a chilling thought.
posted by wenestvedt at 3:03 PM on April 15 [6 favorites]


nosewings: My personal speculation is that the attacker had some kind of deadline.
The previously had speculation that a change to systemd (the services manager program for contemporary Linux) was coming to remove libxz (the game for this program when used as library of computer functions).
posted by k3ninho at 3:03 PM on April 15 [2 favorites]


Realistically, the most likely attackers are Russia, Iran, and North Korea

Sadly, I'm not sure "they have too much to lose" is sufficient to exclude China from the list, as they've been abundantly clear they're upset about the high tech sanctions, have pursued plenty of economically painful policies in the past and will likely do so again before Xi goes. And if the 2023 spy balloon incident demonstrated anything, it was that leadership isn't fully holding the reins on their spycraft apparatus.

Or is there that assumption because of the sophistication of the attack? Because honestly there is absolutely zero reason to NOT believe this might have been the action of a single individual who had plans for sudden supervillain level world domination.

I mean, you are free to deal in outlandish theories if you like.

But from where I sit, it looks organized mainly due to sophistication. Autoconf and m4 are extremely obscure technologies, in the vein of NTP, with an extra pinch of "maybe if I wait five years it will just go away entirely." They exist to make your program compile on things other than Linux, like HP-UX, AIX and other dead UNIX technology. Building expertise in autoconf will not lead you towards repeat business, at any rate. And then the actual attack vectors are another level of effort.

To give you an idea, the previous most comparable attack was a supply chain breach of SolarWinds in 2020. That was attributed to Russians, by everyone in the Trump administration except Trump himself*. It was also a little more clear cut, state actor-wise, as they were also using iPhone 0days sent to government officials over LinkedIn.
posted by pwnguin at 3:19 PM on April 15 [1 favorite]


That Oxide And Friends podcast episode is really fascinating. Thanks so much for posting that!
posted by hippybear at 3:21 PM on April 15 [1 favorite]


I mean, think about the first two seasons of Mr Robot...

Mr Robot gets the “vibe” right in successfully communicating the induced paranoia of working in network security - the creeping tendency to begin viewing everything in life, not just in machines, as an attack surface / threat source. Until it threatens to swallow you whole. Even if your natural neurodivergence is on every major axis except schizoaffective/paranoia.

I would not recommend using it as a reference for threat scale, in any sense of the phrase. The days of L0pht and stupidly easy penetration with a laptop and a copy of Nessus are decades past.

(Source: in 2001, for slightly under 24 hours, I held a loaded gun to the head of every public-facing Microsoft webserver in existence, plus all internal that were on their network horizon, which back in those days was most of them. A slight reworking of RainForestPuppy’s IIS5 Unicode parser buffer overflow, and since I did it in just three and a half hours after a roaring IRC shitfight with a couple people at MS Security Reponse Center, quite possibly the recordholder for fastest Windows patch turnaround on a zero-day ever …I’m kinda shocked that according to Search Activity I never told this story in 22 years on Metafilter).
posted by Ryvar at 3:35 PM on April 15 [24 favorites]


Also, this whole incident has reminded me of something John Oliver once said: "If you want to do something evil, put it inside something boring."
posted by nosewings at 3:38 PM on April 15 [14 favorites]


...go on....
posted by wenestvedt at 3:38 PM on April 15 [3 favorites]


Because I fall asleep after midnight, I heard this program last night* on the BBC World Service, which runs from 12 to 4 AM on our local NPR station:

Codes that Changed the World, Cobol

The upshot was an enormous percentage of vintage business and government mainframes run on Cobol -- but very few of today's genius nerdboy programmers have not a clue as to how to read or write it. Talk about arranging deck chairs on the Titanic...

*The BBC is following NPR's Less-Is-More example in running the same segment 8 times a day for 4 days in a row. Which is one way to reduce payroll expenses on either side of the pond. On the bright side, it's easier to remember their It's deja vu all over again! programming.
posted by y2karl at 4:44 PM on April 15


Not sure how accurate or precise it is, but the NYTimes article from a week or two ago included this nice turn of phrase:
In the cybersecurity world, a database engineer inadvertently finding a backdoor in a core Linux feature is a little like a bakery worker who smells a freshly baked loaf of bread, senses something is off and correctly deduces that someone has tampered with the entire global yeast supply. It’s the kind of intuition that requires years of experience and obsessive attention to detail, plus a healthy dose of luck.
posted by nobody at 5:43 PM on April 15 [23 favorites]


Question: what are the odds that any LLM trained on this code?
As other people have commented, the odds of an LLM writing the exploit itself are low. However, an LLM could still be really useful for the attacker to get into position: have a team of people building personas as open source contributors and use LLMs and other tools to help them seem productive and willing to deal with messy problems. Open source projects notoriously have messy things like tests, packaging, coding style improvements, etc. which nobody enjoys dealing with and thus are perennially neglected. If I was an attacker, that’d be a natural target because people love not to deal with it themselves and it could condition people not to look closely at large changes from you since things like refactoring an internal API or replacing a deprecated function in one of your dependencies can require a lot of code to be updated in a boring manner.
posted by adamsc at 6:31 PM on April 15 [1 favorite]


if I'm understanding one part of that podcast correctly, if you were to download the actual code for xz and compile it, it would return a different, um... checksum [or something] from the corrupted file, because the corrupted file never had its actual code posted. So there's an interesting public face/private deception thing going on with this code package. I can only imagine people are doing comparisons between public code and compiled tarballs pretty extensively right now.
posted by hippybear at 6:48 PM on April 15 [3 favorites]


Hippybear: you understood it perfectly. That’s also why 1970s Antihero and I were saying that there’s not much threat of an LLM being unwittingly trained on this in such a way that it starts pushing security vulnerabilities into its code suggestions - the human-readable version of the ugly bits aren’t public, and the other bits wouldn’t do a whole lot in a different context, which is usually true for most modern vulnerabilities.

Adamsc’s point that LLMs could get people even less in the habit of checking boring code is absolutely correct, and in line with the article linked in the parent which talks about the corrosive effects of capitalism on open source software quality.
posted by Ryvar at 7:50 PM on April 15


Right. I'm literally not worried about this bug infecting LLM programming assistants. I am worried that other similar exploits might already be out in the public and so that seems like an important fact to know -- that the actual public code that is the Open Source for a project won't compile to match what might be in tarballs of distributions.

I mean, that's a major pain in the ass, but at least it is a thing that can be checked?

Who knows what a more sophisticated vector might require in order to be detected.
posted by hippybear at 7:56 PM on April 15 [1 favorite]




Oh, yeah, I saw that..

this AI thing is going to ruin any level of trust we humans ever had in anything online.

It's really interesting to have lived through the tail end of things being really great in the Seventies and watching it all slowly degrade across decades and then suddenly here in the early 2020s someone found a pack of matches and now it's all going to burn down even more quickly.

Honestly, I sort of trust that MetaFilter will last through this upcoming horror, but I'm not sure how much else is going to.
posted by hippybear at 8:04 PM on April 15 [7 favorites]


I'm going with Schneier and the nation/state card here because, man, they did it by messing with autoconf and... nobody understands autoconf.


Once upon a time I was a sysadmin in an academic setting.

That meant helping academics use bespoke software they needed in, um, a diverse array of environments.

For CPU architectures, I had to deal with Intel/AMD, PowerPC, SPARC, MIPS, and others.
For OSes: Linux, Solaris, Iris, *BSD, MacOS, HPUX, RSAix, and I'm probably missing a few.

Autoconf was a godsend. Getting things compiled and running for every professor and postdoc so they could get their stuff out the door was so satisfying and autoconf was what enabled me to do it. I almost never changed an input script into autoconf, never had to.

Sadly, today, for CPUs you have your choice of Coke (x86_64) or Pepsi (ARM), and not many more choices for OSes. That makes me sad because today the LLVM C compiler could make getting a new hardware architecture Unix-ready so much easier. But nobody is taking advantage of it. And so now nobody really needs autoconf and so it needs to be retired.
posted by ocschwar at 8:12 PM on April 15 [9 favorites]


and I'm probably missing a few.

Eunice!

(Congratulations.)
posted by away for regrooving at 9:42 PM on April 15 [1 favorite]


It's not the sophistication of the hack that convinces me it's a nation state thing.

It's the fact that we still don't know who Jia actually is. The kind of operational security to keep your identity hidden over such a long time, making zero mistakes, is very, very rare for an individual to accomplish. However, with nation state assistance, who are well versed in this kind of thing...
posted by DreamerFi at 3:21 AM on April 16 [4 favorites]


In the cybersecurity world, a database engineer inadvertently finding a backdoor in a core Linux feature is a little like a bakery worker who smells a freshly baked loaf of bread, senses something is off and correctly deduces that someone has tampered with the entire global yeast supply. It’s the kind of intuition that requires years of experience and obsessive attention to detail, plus a healthy dose of luck.

And a work environment where you're not constantly under the gun to meet deadlines and can independently decide to devote a bunch of time to investigating something that smells a tiny bit weird.
posted by trig at 3:23 AM on April 16 [8 favorites]


From the article: "Encrypted log-ins to liblzma, part of the XZ compression library, were using up a ton of CPU."

This is quite reassuring because unexplained and excessive CPU use is something that would really annoy a lot of developers isn't it? and make them want to figure out what was happening? Assuming they noticed it.
posted by mokey at 3:44 AM on April 16


A simple(er) social attack vector to me seems to be: 1. find an established contributor to a key piece of infrastructure. 2. arrest or even just document their family and threaten bad things. 3. A trusted member of the community is now your actor.

It doesn't matter much to the criminal gang or national agency if someone burns a years-long career if and/or when they get caught out. They don't have any investment in that. They do have one less asset they can lean on but people are disposable to them anyway.
posted by bonehead at 5:26 AM on April 16 [2 favorites]


DreamerFi has it - Brian Krebs also noted (masto) how exceptionally tidy their OpSec was. Over on linkdin C Wysopal had the idea to use that much like a captcha checking how perfectly you drag the image around, noting:
"It would be interesting to run this analysis across all open source committers...to understand if we can root out other backdoors by detecting signals like these"
posted by zenon at 5:36 AM on April 16


Developers love to use overly beefy PCs, they hate waiting on compiles and downloads and whatnot. That has always been a problem in the business (e.g. Google apps that run great from Google HQ when you have a fiber connection and M2 Macintosh attached but suck for a Windows 7 user at home with a lousy cable modem). I believe most developers wouldn't have noticed or cared about the performance change.
posted by JoeZydeco at 7:21 AM on April 16 [3 favorites]


I believe most developers wouldn't have noticed or cared about the performance change.

If this had been a work machine, you'd just assume it was yet another piece of spyware/exfiltration/security bit installed by IT. Hell, even on my home machine I'd have likely just passed it off as a system patch.

The explanation of this discovery almost sounds like a grey-hat cover story to me.
posted by DigDoug at 9:57 AM on April 16 [6 favorites]


I heard about this in 404 media's podcast feed:

https://pca.st/episode/083975a5-b5b6-49be-b2f9-0c56a3ca32cd
posted by quillbreaker at 12:44 PM on April 16


Right. I'm literally not worried about this bug infecting LLM programming assistants. I am worried that other similar exploits might already be out in the public and so that seems like an important fact to know.

Broadly speaking this feels similar to concern that LLM programming assistants have trained on buggy code? Definitely they have! Generating a lot of code and not thinking about what it’s doing is not a safe thing to do, and I’m sure somebody will learn that the hard way! But that’s not necessarily a sharp break with the status quo of software or the existing tradeoffs of running code you didn’t write, as this incident illustrates.

Also I suspect most true backdoors (targeted/stealthy like this one) are pretty context-sensitive. Something like a straight up buffer overflow vulnerability less so, though.
posted by atoxyl at 1:18 PM on April 16 [1 favorite]


bonehead: i'm sure that happens sometimes, but it's far from optimal, from an intelligence standpoint. People really don't do their best work when they're under that kind of stress, and the kinds of exploits we're talking about here are nontrivial to write and deploy. Also, the handler is pretty unlikely to be able to check the work themselves in any timely or detailed fashion! For this kind of knowledge work, you really want a true believer or at least someone who's earning an honest paycheck from you.
posted by adrienneleigh at 3:42 PM on April 16 [1 favorite]



I believe most developers wouldn't have noticed or cared about the performance change.

If this had been a work machine, you'd just assume it was yet another piece of spyware/exfiltration/security bit installed by IT. Hell, even on my home machine I'd have likely just passed it off as a system patch.


My never-to-be-proved theory is that this hack attempt is by a nation state rival of the USA, that someone in the NSA discovered "Jia Tan" and this project, and he or she tipped Freund off about it so the discovery would come with a story that "Jia Tan" and company would find demoralizing.
posted by ocschwar at 7:19 PM on April 16


This is quite reassuring because unexplained and excessive CPU use is something that would really annoy a lot of developers isn't it? and make them want to figure out what was happening? Assuming they noticed it.
What I'm reading here is that the smart money is on embedding a backdoor into Electron.
posted by dumbland at 10:32 PM on April 16 [4 favorites]


The explanation of this discovery almost sounds like a grey-hat cover story to me.

For what it's worth, once upon a time somewhere in the bowls of microsoft test suites lived a test that assumed a server I ran for a university was up and serving SSH. Apparently a former employee of ours got the bright idea to avoid having to run their own server to test against for their POSIX compatibility tests or something. And thus a sheepish MS engineer contacts me via IRC to inquire if I knew a server was down.

I guess what I'm saying is, MS QA seems to have a lot of random shit they chase down on the regular.
posted by pwnguin at 11:34 PM on April 16 [7 favorites]


« Older “Are you a gay Republican or a Republican gay?”   |   It is a terrible time for the press to be failing... Newer »


You are not currently logged in. Log in or create a new account to post comments.