Intel CPU design flaw forces Linux and Windows kernel redesign
January 2, 2018 9:14 PM   Subscribe

"A fundamental design flaw in Intel's processor chips has forced a significant redesign of the Linux and Windows kernels to defang the chip-level security bug." Performance hit to Intel processors from the past decade is estimated at 5%-30% after the kernel patches are deployed. Linux patches are already available and the Windows update is expected soon. The details of the security flaw have not been released and the Linux patch notes have been redacted.
posted by thecjm (242 comments total) 41 users marked this as a favorite
 
Everything running on Intel x86-64 hardware is affected - dating back to the original Core 2 Duo potentially. AMD architecture does not seem to be affected.
posted by thecjm at 9:30 PM on January 2, 2018 [3 favorites]


You know, 2018, you don't have to show up your siblings. Be more like April 11, 1954; just relax and not rock the boat.
posted by MikeKD at 9:33 PM on January 2, 2018 [14 favorites]


what a shitshow :) cloud providers are already scheduling mandatory VM reboots in the next several days
posted by ish__ at 9:34 PM on January 2, 2018 [1 favorite]


Wow, what a magnificent clusterfuck.
It appears, from what AMD software engineer Tom Lendacky was suggesting above, that Intel's CPUs speculatively execute code potentially without performing security checks. It seems it may be possible to craft software in such a way that the processor starts executing an instruction that would normally be blocked – such as reading kernel memory from user mode – and completes that instruction before the privilege level check occurs.
Also, this is an exploit that has existed for a decade.... what do you wanna be that the NSA et al have been using it for a while ?
posted by Pogo_Fuzzybutt at 9:41 PM on January 2, 2018 [27 favorites]


Yeah, hard to believe this was just discovered. How far back are fixes going to go?
If one were really paranoid the "we need to patch the kernel of every major operating system so don't worry if you notice it slowing down your computer" might seem odd.
posted by bongo_x at 9:45 PM on January 2, 2018 [8 favorites]


oops.
posted by indubitable at 9:52 PM on January 2, 2018 [2 favorites]


A 30% performance hit (potentially)? Jesus. Heads are gonna roll at Intel.
posted by It's Never Lurgi at 9:54 PM on January 2, 2018 [5 favorites]


It's quite probably the 30% won't be the upper limit on the perf hit forever. I'd expect with more about it becoming publicly available, as well as more of the industry's eye on it that optimizations can be developed and implemented.

Hopefully.
posted by ish__ at 10:09 PM on January 2, 2018 [2 favorites]


Ho. Lee. Shit.

Does anyone know if this has anything to do with the recent exploits shown for the Intel Management Engine?
posted by deadaluspark at 10:10 PM on January 2, 2018


So hey uh I guess 2019 is gonna be the year of desktop ARM then
posted by DoctorFedora at 10:14 PM on January 2, 2018 [23 favorites]


Does anyone know if this has anything to do with the recent exploits shown for the Intel Management Engine?

I think that one is an entirely separate egregious clusterfuck.

I suppose it's worth distinguishing design failures from malice in this case, though at some point the two start to bleed together in practical effect.
posted by brennen at 10:16 PM on January 2, 2018 [5 favorites]


Sounds like we'll be getting more answers after the patches are rolled out and the press embargo is lifted... but I'm wondering if this affects the current crop of Intel processors (Skylake et al.) or is it moreso the older chips?
posted by cosmologinaut at 10:17 PM on January 2, 2018


Wow. I'm really curious to find out what the rest of the problem is. I would imagine Intel thought they could get away with skipping access checks on speculative execution because the results of the speculation (in theory) can't be seen by the running program unless/until it ends up going down that branch for real, and presumably an access check gets done at that time. So the exploit must have some clever way of perceiving speculation results ahead of time, before the final check.

Also dying to know whether the AMD guys anticipated this. Clearly they decided it was worth it to eagerly do access checks during speculation, despite the performance hit. So I wonder if one of their reasons was "no security risk if we fuck up elsewhere and expose speculated state to programs somehow".
posted by equalpants at 10:19 PM on January 2, 2018 [4 favorites]


This is pretty deep in the processor pipeline and the whole point of the Management Engine is that it operates independent from the processor, so they're pretty unlikely to be related.
posted by ckape at 10:23 PM on January 2, 2018 [1 favorite]


So hey uh I guess 2019 is gonna be the year of desktop ARM then

Teased in the last line of the Register article: similar patches seem to be in the pipeline for ARM Linux:
PS: It appears 64-bit ARM Linux kernels will also get a set of KAISER patches, completely splitting the kernel and user spaces, to block attempts to defeat KASLR. We'll be following up this week.
posted by We had a deal, Kyle at 10:24 PM on January 2, 2018 [2 favorites]


So hey uh I guess 2019 is gonna be the year of desktop ARM then

Somewhere an AMD marketing exec thinks they finally have a leg up on Intel, reads this comment, then cries silently to themselves
posted by thecjm at 10:25 PM on January 2, 2018 [26 favorites]


what do you wanna be that the NSA et al have been using it for a while ?

I personally doubt it (for no solid reason, although the various leaks about their offensive technology should have included if it was in their arsenal). I would imagine they've known about it for awhile now, because SO MANY PEOPLE have known about this.

Think about it - Intel (obviously), AMD (presumably), linux kernel folks, Microsoft kernel folks, Apple kernel folks, BSD kernel folks, Amazon (cloud providers that are ready to take this large patch darned well have understood what its for), Google, Microsoft cloud folks. I'm curious how many parties I'm leaving out, because I'm sure there are several, but I imagine various folks in the US (and other) governments have known, so they'll know to patch their critical infrastructure ASAP.

Other things I'm curious about: how many governments have working weaponizations of this already? How long from the embargo being lifted until each given government (with an offensive cyber team -which is probably most) has weaponized code? How long from the embargo lifted until PoC's are released publicly.

One final thing that occurred to me regarding this - if slowdown is 5%-30%, what will this translate into environmental impact - how much more power will have to be generated to serve the cloud computing needs out there? You can bet the cloud providers have cost-analysis of ditching their old hardware to get hardware that doesn't have this problem (and there will be an environmental impact on that as well), even if it's not cost effective to pull the trigger yet.
posted by el io at 10:32 PM on January 2, 2018 [10 favorites]


reading kernel memory from user mode – and completes that instruction before the privilege level check occurs.

they don't call it sequential coupling for nuthin.
posted by j_curiouser at 10:35 PM on January 2, 2018


Somewhere an AMD marketing exec thinks they finally have a leg up on Intel, reads this comment, then cries silently to themselves

Not really? Ryzen and Threadripper genuinely got people excited (I'm running an 1800X in my desktop, and it's a good processor), and this pretty much wipes out the one advantage that Intel had over the new AMD offerings (single core performance). At this point, I have a feeling that it's going to be hard to recommend Intel over AMD.
posted by NoxAeternum at 10:37 PM on January 2, 2018 [6 favorites]


So the exploit must have some clever way of perceiving speculation results ahead of time, before the final check.

If the speculative execution can pull shit into cache lines that doesn't get evicted by the failed check, that might open you up to timing attacks.
posted by a snickering nuthatch at 10:40 PM on January 2, 2018 [9 favorites]


If the speculative execution can pull shit into cache lines that doesn't get evicted by the failed check, that might open you up to timing attacks.

Yeah, I think that's what the Anders Fogh post linked from the Reg article is getting at:
This is truly bad news for the security. [It] gives microarchitecture side channel attacks additional leverage – we can deduct not only information from is actually executed but also from what is speculatively executed.
posted by We had a deal, Kyle at 10:46 PM on January 2, 2018 [1 favorite]


but I'm wondering if this affects the current crop of Intel processors (Skylake et al.) or is it moreso the older chips?

It's everything Skylake generation and newer, across all their lines.
posted by NoxAeternum at 10:50 PM on January 2, 2018 [3 favorites]


"At this point, I have a feeling that it's going to be hard to recommend Intel over AMD."

We'll have to wait for things to get released, but I don't see this changing much to be honest. The numbers I've seen have said that 5% perf impact is typical, and I'm not sure that will be enough to sway many if they weren't already convinced by AMD's offerings.

I also suspect that apps that do a lot of syscalls and are hit hard by this will have no choice but to attempt to optimize their syscall use somehow, there's too much existing silicon to to just shrug and say "oh well, we're 30% slower now".
posted by markr at 10:57 PM on January 2, 2018 [2 favorites]


It's kind of funny, macOS's K32 kernel (pre OS X 10.7) used to completely reload the virtual address space and flush the TLB on a context switch which would have been immune to the attack. K64 on the other hand splits the virtual address space and maps the kernel into the top half of the active process's virtual address space to save the TLB flush on a context switch to kernel mode.
posted by Talez at 10:58 PM on January 2, 2018 [8 favorites]


If the speculative execution can pull shit into cache lines that doesn't get evicted by the failed check, that might open you up to timing attacks.

God I hope it's not that easy. If I'm reading We had a deal, Kyle's link correctly, looks like Mr. Fogh tried that and (fortunately) it didn't work:

I double checked by accessing the cache line I wanted to access and indeed that address was not loaded into the cache either. Consequently it seems likely that intel do process the illegal reading of kernel mode memory, but do not copy the result into the reorder buffer.
posted by equalpants at 11:01 PM on January 2, 2018 [2 favorites]


If the speculative execution can pull shit into cache lines that doesn't get evicted by the failed check, that might open you up to timing attacks.

my guess is that privileged cache lines are being fetched speculatively but when the speculation fails the cache tags are set wrongly - maybe just the privilege tags, or maybe them and the address - probably copied from some successful transaction running at the same time.

kernel ASLR means you can't easily guess what would be a good address to find the data at, and the split kernel/user page tables mean that the speculation can't actually force a kernel space cache fetch
posted by mbo at 11:03 PM on January 2, 2018


I wish I could think of a single benefit this person's vuln writeup has gifted the world, but to date, all I've seen is upvotes on various social networks, without a single user saying "I've begun work on a patchset for XYZ not-Linux OS", and without any new remediation being made available or even described. It would have been a great writeup once remediation was shipped, but this early, with so little known, and so many people clearly involved in fixing it, it just feels like ambulance chasing for social upvote notoriety.
posted by crysflame at 11:05 PM on January 2, 2018 [2 favorites]


I've often speculated (given the portion of their revenue that comes from iPhones) that Apple would move their laptops off Intel chips and run the same hardware throughout, sort of a Marklar 2.

Now that it's public, I wonder what kind of calls have been going to and from Cupertino and Beaverton these days.
posted by fifteen schnitzengruben is my limit at 11:07 PM on January 2, 2018 [1 favorite]


The python sweetness post, also linked from the Reg article, is (self-admittedly) like someone trying to connect a lot of dots together and ending up accidentally drawing a dinosaur:
Guesswork: it effects major cloud providers

On the kernel mailing list we can see, in addition to the names of subsystem maintainers, e-mail addresses belonging to employees of Intel, Amazon and Google. The presence of the two largest cloud providers is particularly interesting, as this provides us with a strong clue that the work may be motivated in large part by virtualization security.

Which leads to even more guessing: virtual machine RAM, and the virtual memory addresses used by those virtual machines are ultimately represented as large contiguous arrays on the host machine, arrays that, especially in the case of only 2 tenants on a host machine, are assigned by memory allocators in the Xen and Linux kernels that likely have very predictable behaviour.

Favourite guess: it is a privilege escalation attack against hypervisors

Putting it all together, I would not be surprised if we start 2018 with the release of the mother of all hypervisor privilege escalation bugs, or something similarly systematic as to drive so much urgency, and the presence of so many interesting names on the patch set’s CC list.
I'm not sure "Amazon and Google participated in the patch" means quite as much as the author thinks it does; I assume that as huge-scale deployers of Linux they normally participate in a lot of kernel dev. The more interesting question would be: were they unusually active on it?
posted by We had a deal, Kyle at 11:11 PM on January 2, 2018 [2 favorites]


my guess is that privileged cache lines are being fetched speculatively but when the speculation fails the cache tags are set wrongly - maybe just the privilege tags, or maybe them and the address - probably copied from some successful transaction running at the same time.

No the CPU executes it speculatively but doesn't let it out before it's retired to the reorder buffer. This opens you up to statistical attacks using speculative execution to try and map valid sections of kernel mode memory. However, the author couldn't convert it into a way to get into kernel mode.

Information is there that a smart enough cookie might be able to use it to turn it into a working exploit or assist another exploit. It's not exactly DEFCON 1 but it's sure as shit not something you want to be available to a black hat.
posted by Talez at 11:28 PM on January 2, 2018 [2 favorites]


No the CPU executes it speculatively but doesn't let it out before it's retired to the reorder buffer. This opens you up to statistical attacks using speculative execution to try and map valid sections of kernel mode memory. However, the author couldn't convert it into a way to get into kernel mode.

this is not what I'm suggesting - I suspect that kernel code/data is being read into the L1 cache with the wrong tags so it can be accessed from user space, the load that causes this doesn't result in data going to the reorder buffer and being written to a register (or back into the cache line) because the speculation failed .... but the cache line is left accessible from user space for subsequent access - mapped to a physical kernel address you can read and write kernel space data
posted by mbo at 12:03 AM on January 3, 2018


You can get the old os x behavior back with with the '-no_shared_cr3' boot arg. This will have a significant perf hit on syscalls/traps. iOS still does this. Without knowing what the bug is exactly, it's impossible to say if this would be an actual mitigation.

I've heard a bunch of rumors about this, but given the amount of hype this has already I don't think more speculation is super helpful.
posted by yeahwhatever at 1:16 AM on January 3, 2018 [3 favorites]


At one point, Forcefully Unmap Complete Kernel With Interrupt Trampolines, aka FUCKWIT, was mulled by the Linux kernel team, giving you an idea of how annoying this has been for the developers.
posted by DreamerFi at 1:31 AM on January 3, 2018 [13 favorites]


I've heard a bunch of rumors about this, but given the amount of hype this has already I don't think more speculation is super helpful.

People aren't speculating to be helpful. They are curious. So what's the problem?
posted by thelonius at 1:35 AM on January 3, 2018 [6 favorites]


Early benchmarks seem to suggest the performance hit on the average userland app is about nil. If you run DB servers, though...
posted by atoxyl at 1:39 AM on January 3, 2018 [3 favorites]


or other IO-heavy stuff
posted by atoxyl at 1:41 AM on January 3, 2018


I'm not sure "Amazon and Google participated in the patch" means quite as much as the author thinks it does; I assume that as huge-scale deployers of Linux they normally participate in a lot of kernel dev. The more interesting question would be: were they unusually active on it?

It's not just that they had some kernel engineers engaged in the process, but that both Microsoft and Amazon are preparing for large-scale reboots to roll out a patch in the next week or so (we don't, of course, know that this is specifically what the reboots are for, but "oh crap, there's another critical issue involving VM security" wouldn't really be an improvement on the situation).
posted by zachlipton at 1:58 AM on January 3, 2018 [2 favorites]


I'd add that the kernel patch adds a #DEFINE X86_BUG_CPU_INSECURE, so it's more like being handed a paint-by-numbers of a dinosaur and making estimates on how big its teeth are and how much it will hurt when it bites you than anything absurdly speculative.
posted by zachlipton at 2:02 AM on January 3, 2018 [2 favorites]


I enjoyed this last bit from the Reg:
Details of the vulnerability within Intel's silicon are under wraps: an embargo on the specifics is due to lift early this month, perhaps in time for Microsoft's Patch Tuesday next week. Indeed, patches for the Linux kernel are available for all to see but comments in the source code have been redacted to obfuscate the issue.
posted by sebastienbailard at 2:04 AM on January 3, 2018


Is it that old classic “#fix this later” comment?
posted by wenestvedt at 3:29 AM on January 3, 2018 [1 favorite]


I'd love to snark about security through obscurity but the Linux kernel group presumably know this better than I do. They can't really worry about the governmental TLA orgs who have the means to easily reverse-engineer an exploit based on the kernel patch, because they've either done their work already or are well under way. Their concern is going to be with keeping out the lower-rent hackers for hire and prevent scriptkiddy attacks.
posted by ardgedee at 3:48 AM on January 3, 2018 [3 favorites]


So . . AMD Hackintosh?
posted by petebest at 4:13 AM on January 3, 2018 [2 favorites]


Maybe there's a way to boot into Minix until a new kernel is released?
posted by farlukar at 4:25 AM on January 3, 2018 [1 favorite]


This feels reminiscent of the VW diesel scandal.

Sure it may be a bug but it’s not the disease.
posted by Annika Cicada at 4:25 AM on January 3, 2018 [2 favorites]


The good news is that PowerPC isn't affected, so you should be safe on your new X5000 Amiga.

If you don't have an Amiga, and weren't planning on migrating, start now.
posted by Slap*Happy at 5:14 AM on January 3, 2018 [21 favorites]


They can't really worry about the governmental TLA orgs who have the means to easily reverse-engineer an exploit based on the kernel patch, because they've either done their work already or are well under way.

Given that the chosen solution is using the proverbial sledgehammer (total VM separation between kernel and userspace barring the absolute minimum kernel data required to actually map the kernel memory back into the TLB on a system call) to crack this particular nut you really can't reverse-engineer an exploit directly from this patch.

This isn't like a buffer overflow where you can see the test change at a single point in the code & intuit that a suitably crafted argument will overflow the buffer: the patch is a systematic architectural change to the entire operating system. It must be a serious problem though - there’s no way they would be making these changes otherwise.
posted by pharm at 5:15 AM on January 3, 2018 [1 favorite]


It'll be interesting (read: disappointing) to see how far back Apple shovels a patch, or if they'll let owners of affected, yet older, Macs hang in the wind.
posted by Thorzdad at 5:24 AM on January 3, 2018 [5 favorites]


Who cares about servers and VMs? Gaming performance is not affected by the recent patch (in Linux).
posted by thecjm at 5:37 AM on January 3, 2018


What a convenient way to get people to buy new computers.
posted by Bee'sWing at 5:52 AM on January 3, 2018 [8 favorites]


Apple's last security update covered El Capitan, which covers every Mac to 2009 or so. Even if they only fixed High Sierra that's every Mac back to 2010. How old exactly do you want them to cover?
posted by grahamparks at 5:55 AM on January 3, 2018 [2 favorites]


Apple's last security update covered El Capitan, which covers every Mac to 2009 or so. Even if they only fixed High Sierra that's every Mac back to 2010. How old exactly do you want them to cover?

Apple should patch all Macs with affected Intel chips. Period. Which version of MacOS they're running is utterly irrelevant.

Lots of Mac users do not run the latest version of the OS their machines are capable of running, for various reasons. For instance, I have a late-2009 iMac (Intel Core2 Duo) running 10.9.5 (Mavericks) and probably won't update it any further because of a couple of apps that I have to use quite regularly, but haven't been updated in ages and, for which, there are no other options. And, as far as I can tell from other user reports, the apps don't seem to play well with later versions of MacOS.
posted by Thorzdad at 6:05 AM on January 3, 2018 [9 favorites]


The worst case scenario is that this attack can be used against hypervisors as well as the kernel. It seems unlikely to me that the same defect would affect both, but we don't know for sure at this point.

The best case scenario is that Intel already has a microcode fix available, and is pushing it out to the cloud providers ASAP (which would explain the mass reboots of VMs). Also, if that's true, the KPTI fix would only be necessary on machines without the microcode fix, so the massive performance hit from KPTI won't apply to users with that update.

We just don't know enough to be sure. KPTI is a sledgehammer fix that could apply to many different classes of attacks (including outright microarchitecture bugs, but also timing attacks) and it's hard to really deduce anything from the fact that it's getting implemented in a hurry.
posted by miyabo at 6:12 AM on January 3, 2018


Apropos of nothing at all, it seems Intel's CEO sold a whole load of stock last month
posted by DangerIsMyMiddleName at 6:26 AM on January 3, 2018 [26 favorites]


I mean, it doesn't help if you're running particularly old hardware (although YMMV for what defines "old" - my 2012 iMac's CPU is still pretty much the top of the line until the current BYO iMacs), but you can virtualize OSX back to at least 10.7 in a more recent host OS. Doesn't help if you need hardware access to specialist gear (music/science/etc), but if it's a more pedestrian "I need to run an older version because Intuit keeps screwing with Quicken" reason, you can have your cake and eat it too.
posted by Kyol at 6:47 AM on January 3, 2018


Did Apple start allowing people to virtualize the Mac OSX client operating system? My research indicates they only started allowing that for the server version.
posted by Radiophonic Oddity at 7:04 AM on January 3, 2018


I believe the license for OS X has allowed virtualization on Mac hardware since Lion.
posted by strange chain at 7:07 AM on January 3, 2018 [2 favorites]


Yeah, I just stood up a Lion VM to check, VMware Fusion installed it without blinking. I think Snow Leopard Server (10.6) was the first official OS that Apple licensed for virtualization, but they accepted it (on Mac hardware, anyway) for desktops since Lion (10.7).
posted by Kyol at 7:16 AM on January 3, 2018 [1 favorite]


Performance hit to Intel processors from the past decade is estimated at 5%-30%

Umm....what determines that number?
posted by snuffleupagus at 7:17 AM on January 3, 2018


That number is mostly dependent on how many kernel/userspace context switches your application needs. If your app is mostly in memory and doesn't need to dump the l2/l3 cache for a context switch to kernel space to write data to disk, you're probably on the lower end of that estimate. If your app is heavily IO dependent... Yech.
posted by Kyol at 7:21 AM on January 3, 2018 [1 favorite]


It's stuff like this, along with articles about computer latency that make me believe that CPUs need to be fundamentally re-architected. As in, dump the whole von Neumann architecture conceit. CPUs are designed around the idea that computers primarily perform computations. That really isn't true any more. Computers are mostly about pushing bits around, and CPUs can't keep up with the high bitrates we are dealing with these days. CPUs (and OSes) need to be designed around the idea of dataflow, with computation as a secondary concern.
posted by 1970s Antihero at 7:35 AM on January 3, 2018 [4 favorites]


Apropos of nothing at all, it seems Intel's CEO sold a whole load of stock last month

That is suspicious as hell: he sold roughly half of the shares he held, leaving him with exactly the bare minimum that the CEO is obligated to hold. It's probably too much to hope for, but I hope the SEC rakes him over the coals.

This also has the potential to become a massive class action lawsuit.
posted by jedicus at 7:39 AM on January 3, 2018 [7 favorites]


Joe Hills offers a metaphor to describe the situation.
posted by drezdn at 7:41 AM on January 3, 2018 [8 favorites]


NoxAeternum: It's everything Skylake generation and newer, across all their lines.

Where are you getting this? The article says, "It is understood the bug is present in modern Intel processors produced in the past decade."

That's from long before Skylake.
posted by Slithy_Tove at 7:46 AM on January 3, 2018 [4 favorites]


Joe Hills offers a metaphor for describe the situation.

I thought that the baker wasn't ever actually handing over the car-key pie, but it was bad enough to bake it and leave it on the windowsill where someone could snatch it.
posted by paper chromatographologist at 7:49 AM on January 3, 2018 [3 favorites]


Or, the car key pie is never handed over, but now people know it's there they can try and get a quick glance at the key-shaped impression in the crust as it's getting tossed in the garbage.
posted by thecjm at 7:57 AM on January 3, 2018 [1 favorite]


Wikipedia:
Intel 64 is Intel's implementation of x86-64, used and implemented in various processors made by Intel....

Intel's processors implementing the Intel64 architecture include the Pentium 4 F-series/5x1 series, 506, and 516, Celeron D models 3x1, 3x6, 355, 347, 352, 360, and 365 and all later Celerons, all models of Xeon since "Nocona", all models of Pentium Dual-Core processors since "Merom-2M", the Atom 230, 330, D410, D425, D510, D525, N450, N455, N470, N475, N550, N570, N2600 and N2800, and all versions of the Pentium D, Pentium Extreme Edition, Core 2, Core i7, Core i5, and Core i3 processors
.
posted by snuffleupagus at 7:58 AM on January 3, 2018 [1 favorite]


And I don't buy that this isn't going to raise a huge stink with gamers. Lots of games have huge I/O requirements. Disk and network.
posted by snuffleupagus at 8:01 AM on January 3, 2018 [3 favorites]


he sold roughly half of the shares he held, leaving him with exactly the bare minimum that the CEO is obligated to hold.

FWIW, he started both 2016 and 2017 with close to (just close to, not exactly) that amount as well. Looks like this particular transaction had a lot of purchasing as well via employee stock options.
posted by Nonsteroidal Anti-Inflammatory Drug at 8:13 AM on January 3, 2018


The best case scenario is that Intel already has a microcode fix available,

TFA specifically says unfixable in microcode, thus OS kernel patches required.
posted by k5.user at 8:14 AM on January 3, 2018 [3 favorites]


AMD has had its own FDIV-category bug in the recent past, and at least one privilege escalation bug that required a microcode update. I don't know that this will lead to people moving wholesale to AMD -- might lead to an increase in people moving into hardened missile silos, though.
posted by RobotVoodooPower at 8:17 AM on January 3, 2018


And I don't buy that this isn't going to raise a huge stink with gamers. Lots of games have huge I/O requirements. Disk and network.

And they love to complain, also.
posted by thelonius at 8:19 AM on January 3, 2018 [4 favorites]


Apple's last security update covered El Capitan, which covers every Mac to 2009 or so. Even if they only fixed High Sierra that's every Mac back to 2010. How old exactly do you want them to cover?

All of them?
posted by zombieflanders at 8:19 AM on January 3, 2018 [11 favorites]


I think it's too early to tell what the real performance hit will be. Linux system calls will get more expensive and that's bad. How much more expensive has to wait until the patching and optimization settles down. And then what kinds of work loads are most affected will need to be measured. FWIW the 5-30% number everyone's quoting is before optimizations are applied for CPUs with the PCID feature. It may be nowhere near as bad on that. AFAIK most CPUs sold in the past couple of years have PCID.

It's interesting to me we mostly ignore the threat of a possibly related attack: bit flips (keyword: Rowhammer). Long story short, modern RAM is so flaky that a hostile process can flip arbitrary bits in memory. There are demos of Javascript in web pages messing with RAM in specific, intentional ways. I don't think there's any software mitigation deployed to fix that, and so far the market isn't demanding more expensive RAM that is more robust.
posted by Nelson at 8:48 AM on January 3, 2018 [3 favorites]


> All of them?

Will you similarly expect all operating system companies to backport patches to all operating systems they officially supported eight years ago? That will include Windows XP, Fedora 11, and Ubuntu Dapper Drake.
posted by ardgedee at 9:01 AM on January 3, 2018 [2 favorites]


Microsoft did publicly security-patch XP for WannaCry earlier this year.
posted by We had a deal, Kyle at 9:10 AM on January 3, 2018 [6 favorites]


Apple's last security update covered El Capitan, which covers every Mac to 2009 or so. Even if they only fixed High Sierra that's every Mac back to 2010. How old exactly do you want them to cover?

I don't understand. 10.11 is barely 2 years old.
I only upgraded to 10.10 less than a year ago.
posted by bongo_x at 9:21 AM on January 3, 2018 [1 favorite]


CPUs (and OSes) need to be designed around the idea of dataflow, with computation as a secondary concern.

IIRC, the Sony PlayStation 2 was designed along those lines, and was consequently a nightmare to develop for. Everything was driven by the DMA controller between the various processors, and so a program was essentially a long, complex bill of logistics. That's the reason that the quality of PS2 games improved over several years: it took that long for developers to figure out how to properly work with the hardware.
posted by acb at 9:32 AM on January 3, 2018 [5 favorites]


Will you similarly expect all operating system companies to backport patches to all operating systems they officially supported eight years ago?

Yup. One billion percent yup. That, or offer a free upgrade to a newer, patched OS.

Though in the case of hardware faults like this, if the OS companies were relying in good faith on Intel's statements about how the cpus worked, Intel should be paying the OS companies for the required work and/or OS copies.
posted by GCU Sweet and Full of Grace at 9:35 AM on January 3, 2018 [4 favorites]


Will you similarly expect all operating system companies to backport patches to all operating systems they officially supported eight years ago?

I'm referring to the hardware, so if the non-deprecated versions of Windows and Linux don't support the older hardware, then hell yes I'd expect it. The problem, although not necessarily unique to them, is that Apple really loves to drop support for operable, capable hardware.
posted by zombieflanders at 9:36 AM on January 3, 2018 [5 favorites]


All 64-bit Mac devices need to be fixed.

I think that covers back to what, 2008 to present, which seems to me like the 2008 Mac systems will be able to run an OS that Apple intends to update?

Am I wrong here?
posted by Annika Cicada at 9:49 AM on January 3, 2018


The problem, although not necessarily unique to them, is that Apple really loves to drop support for operable, capable hardware.

I mean, hell, even 32-bit hardware would still work fine if they'd support it. I finally made my mom retire my old Core Duo iMac last month when not even Firefox dropped support ages ago. I think Windows 10 will work on it, but we don't have a license, so off to recycling it goes.
posted by uncleozzy at 9:53 AM on January 3, 2018


Ars has an article up with more info.
posted by Pogo_Fuzzybutt at 9:53 AM on January 3, 2018 [4 favorites]


Good news: TempleOS is immune to the flaw.
posted by ardgedee at 10:08 AM on January 3, 2018 [7 favorites]


I have been hot-taking that there is no such thing as a truly secure computer, for some time now, and this will not tend to make me quit thinking that. Unfortunately, that does not relieve you from trying as best as you can to secure computers.
posted by thelonius at 10:13 AM on January 3, 2018 [1 favorite]


Sorry, my earlier comment on not wanting to speculate was just that I personally didn't want to speculate, not that generally people shouldn't speculate. It certainly is fun.

It looks like the bug is more or less public now -- the ability to dump kernelmode memory from user mode, possibly across virtual machines. I assume the cross-vm potential is what has everyone freaking out.

This means the macOS discussion is largely irrelevant (macOS is almost never run in co-resident environments), and for the most part this will not significantly impact common home users of windows or linux (most people don't run hypervisors as a security boundary). For home users of windows/linux, user to kernel vulnerabilities are reasonably common and not a reason to panic -- this would definitely require another bug to exploit on linux and might on windows (I'm less familiar with the token stuff they do there...).

In summary, if you're AWS, Azure, or Intel you probably had a shitty holiday. Everyone else is only transitively fucked because AWS is fucked, not fucked directly :)
posted by yeahwhatever at 10:21 AM on January 3, 2018 [4 favorites]


64-bit Intel Macs can be divided into three groups, based on the highest support OS version:

- Anything made after 2010-ish can run High Sierra, which may have already been fixed.

- 2009/2010 models can only run El Capitan, but I'd expect that to be fixed given El Capitan got a security update just last month.

- 2008/2009 models can only run up to Lion (10.7), which was last updated in 2012. I don't expect these to be fixed.
posted by grahamparks at 10:23 AM on January 3, 2018 [4 favorites]


If you're really gonna stress about recent bugs that affect macOS, I'd suggest worrying about the IOHIDFamily race condition that dropped on NYE over this one -- the vector is less sexy I guess but full kernel read/write is certainly worse than full kernel read...

It's also possible that macOS has already fix this -- the article dropping the above bug mentioned a change in between 10.12 and 10.13 with regard to a microarchitectural kaslr leak. That would be a change consistent with the fix for this. Considering that this bug was known to the vendors in November, the timeframe fits.
posted by yeahwhatever at 10:39 AM on January 3, 2018 [3 favorites]


On the upside, a whole lot of VPS instances and containers are getting rebuilt after like a year.
posted by mikelieman at 11:01 AM on January 3, 2018 [1 favorite]


On the upside, a whole lot of VPS instances and containers are getting rebuilt after like a year.

Oh, look, there's a dependency issue between mojolicious and mysql. Joy.
posted by mikelieman at 11:02 AM on January 3, 2018 [3 favorites]


To fix the pie metaphor:
If you call and order a crack cocaine pizza, the guy will obviously say no. However if you call to give advance notice that you will be ordering either a pepperoni pizza or a crack pizza a bit later, the pizza cook makes both just in case, and doesn’t complain when you call later and choose the pepperoni. You then steal the discarded crack pizza from the dumpster behind the pizza place.

The good solutions are, a) put a lock on the dumpster, or b) don’t bake anything illegal. Until now it was believed that the lock on the dumpster was secure, but some people really like crack and have started drilling holes in the dumpster, hijacking the garbage truck, etc.

The solution currently proposed is actually c) : to relocate the kitchen and the dumpster to a high security area 20 miles away, which means everybody’s pizza is going to be cold.
posted by w0mbat at 11:21 AM on January 3, 2018 [24 favorites]


w0mbat: where do you order your pizzas from? asking for a friend.
posted by el io at 11:31 AM on January 3, 2018 [22 favorites]


The solution currently proposed is actually c) : to relocate the kitchen and the dumpster to a high security area 20 miles away, which means everybody’s pizza is going to be cold.

AND, kind of like how Jimmy John's is freaky fast because they have like 5 people making each sandwich, not because of any actual grand discoveries in sandwich artistry physics, the pizza place only met their delivery guarantee because they made everything ahead of time. It turns out you just can't* deliver a Chicago deep dish pizza across town in under 30 minutes if you start mixing your dry goods when the phone call comes in.

So it's going to take a lot longer to get delivery, for the time being. But over time, it'll probably turn out it's ok to make the dough ahead of time, and maybe throw some of the ingredients on, and maybe run the oven a little hotter to speed things up, and then maybe that guarantee will start working out again.

*Oddly enough, the underdog pizza place down the road (Aaron's Microwaved Dishes) who makes their own, slightly different interpretation of Chicago pizza has been getting closer and closer to matching that guarantee apparently without making it all ahead of time. Instead, they've been running their oven quite a bit hotter ¯\_(ツ)_/¯
posted by Nonsteroidal Anti-Inflammatory Drug at 11:36 AM on January 3, 2018 [5 favorites]


Can we not have the Pizza Holy War in this thread


(More relevantly: If I understood what I read upstream in this thread correctly, then I don't see how this is not going to be really truly terrible for games. My computational modeling work is going to be affected less since I can keep it strictly in memory until it's finished, but games.... and the work of some friends of mine who work over multiple gigabytes of data at one time... It'll be a Time Warp to 2006. )
posted by seyirci at 11:50 AM on January 3, 2018 [1 favorite]


Apple's last security update covered El Capitan, which covers every Mac to 2009 or so. Even if they only fixed High Sierra that's every Mac back to 2010.

and

2009/2010 models can only run El Capitan, but I'd expect that to be fixed given El Capitan got a security update just last month.

FFS, MacOS 10.13 “High Sierra” hardware requirements:
MacBook (Late 2009 or newer)
MacBook Pro (Mid 2010 or newer)
MacBook Air (Late 2010 or newer)
Mac mini (Mid 2010 or newer)
iMac (Late 2009 or newer)
Mac Pro (Mid 2010 or newer)
The late 2009 iMacs and MacBooks were new hardware designs (not processor bumps), quite popular, and can run 10.13.
posted by D.C. at 12:02 PM on January 3, 2018


w0mbat: where do you order your pizzas from? asking for a friend.

Luckily, Pizza Express pizza is my crack, but unluckily the nearest one is 5,000 miles away.
posted by w0mbat at 12:09 PM on January 3, 2018


The analogy breaks down when you realize that Chicago deep dish pizza is only meant to be gawked at and not eaten.

Can we not have the Pizza Holy War in this thread

/me checks the website address just to make sure this is still metafilter, feels like she no longer knows this place anymore.

Meanwhile, back on topic, this appears to be mostly concerning multi-tenant hypervisor implementations, so basically cloud services got wrecked by this.

I don't know that your local machine is much to worry about?
posted by Annika Cicada at 12:10 PM on January 3, 2018 [4 favorites]


I like little demographic quirks - like how MacOS has 6% market share in the PC space, but any post about something computer related on MeFi will turn into a back and forth about what exactly counts as an old Mac and whether someone's laptop can run 10.xx or not.

If Microsoft decides to stick to their end of life claims and not patch the kernel in Windows XP more computers will be left exposed to this vulnerability than every single 64-bit MacOS device, patched or not, new or not, out there.
posted by thecjm at 12:12 PM on January 3, 2018 [5 favorites]


If Microsoft decides to stick to their end of life claims and not patch the kernel in Windows XP

I'd think they'd patch XP. They did just patch XP for WannaCry
posted by Twain Device at 12:20 PM on January 3, 2018


Intel is subtly hinting that ARM and AMD chips may have similar flaws, or just doing a CYA.
posted by RobotVoodooPower at 12:26 PM on January 3, 2018 [2 favorites]


When I saw the Ars comment that this could hit some CPU/program combinations up to 30% I was wondering if this was part of the reason AMD has been having trouble, but sadly it sounds like this isn't the case.

Can someone explain why this is dangerous? What does the kernal need to hide from userspace, if user programs still can't write to it?

Also: How hard is it going to be to design new chips to avoid this, or do we know that yet?
posted by Canageek at 12:30 PM on January 3, 2018


Intel is subtly hinting that ARM and AMD chips may have similar flaws

On the LKML, AMD says:
AMD processors are not subject to the types of attacks that the kernel
page table isolation feature protects against. The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.
posted by Nonsteroidal Anti-Inflammatory Drug at 12:31 PM on January 3, 2018 [7 favorites]


Intel is subtly hinting that ARM and AMD chips may have similar flaws

I'm reading this as since Intel has such a big market share the major OSes are about to change how they work with kernel's pagefile and everyone else better be ready for it.
posted by thecjm at 12:54 PM on January 3, 2018


Yup. One billion percent yup. That, or offer a free upgrade to a newer, patched OS.

Oh, good - I mean, macOS has been free since Mavericks (10.9), so that's not really an argument against upgrading. I completely understand that there are entirely legitimate personal reasons to lock your hardware to an older release, but Apple isn't exactly nickeling and diming you for OS updates if you're still on officially supported ~8 year old hardware.

I'd think they'd patch XP. They did just patch XP for WannaCry

Yeah, but wasn't that backporting a SMB update (and mostly at that sticking a fork in XP's support for SMBv1, right?) and not so much dusting the kernel off after a decade of neglect? Seems like two different scales.
posted by Kyol at 1:05 PM on January 3, 2018 [1 favorite]


I'd think they'd patch XP. They did just patch XP for WannaCry

There are a couple of major differences between the Windows and Mac install bases that make this a necessary move for Microsoft but basically a giant nothingburger for Mac.

The most significant one, which is so significant that it completely dwarfs any other consideration, is this: Windows XP has an absolutely massive install base for kiosks and POS terminals, as well as a wide variety of edge case uses involving extremely expensive specialty hardware (especially scientific and medical instrumentation) where the vendors don't exist anymore or haven't bothered to make drivers and control software for newer Windows OSes. Nobody uses Mac OS X for kiosks or POSes; if they use Apple at all for these cases, they're doing it through iOS instead.

In other words, there are hundreds of millions of legacy systems still running Windows XP, many of which are absolutely business critical systems if not literally responsible for people's lives. If Microsoft turned its back on those customers, they'd be dead as a company almost overnight. Meanwhile, nobody gives a shit if J. Random Programmer just really, really loves his Powerbook G5 and refuses to upgrade.
posted by tobascodagama at 1:06 PM on January 3, 2018 [5 favorites]


I’ve been giving my dentist shit for three years for not upgrading their XP controller for the whizbang patient management thing.

Went for a colonoscopy not long enough ago and the colonoscope-a-tron was running Windows for Workgroups.
posted by lagomorphius at 1:19 PM on January 3, 2018 [10 favorites]


Oh, even better for a lot of medical devices - the FDA wants those declared as validated systems, so the regulatory burden for a vendor to update is pretty significant. I mean, don't get me wrong, I don't want my MRI controller to crash unexpectedly while I'm in it and quench the superconductors either...
posted by Kyol at 1:23 PM on January 3, 2018 [4 favorites]


I like little demographic quirks - like how MacOS has 6% market share in the PC space, but any post about something computer related on MeFi will turn into a back and forth about what exactly counts as an old Mac and whether someone's laptop can run 10.xx or not.

If Microsoft decides to stick to their end of life claims and not patch the kernel in Windows XP more computers will be left exposed to this vulnerability than every single 64-bit MacOS device, patched or not, new or not, out there.


Cool. Despite having a house full of computers, I have barely seen a Windows computer for 20 years. So I am concerned about how this affects everyone but I am also wondering how this affects me.
posted by bongo_x at 1:57 PM on January 3, 2018 [1 favorite]


Anyone running an old Windows XP box is probably not also applying security patches to it. Also worth noting that the fix for this bug appears to be a major rearchitecture of the way the system handles memory, not some simple bugfix. The testing cycle for changes like this has to be very expensive.

One thing I'm trying to guess about is how serious this bug is likely to be for normal home computers. On a typical computer running untrusted code of any sort of is already asking for trouble. It may in theory have lower privileges, etc but in practice every OS has so many security holes that if any unstrusted code is on your machine at all you're probably screwed. (Outside of very carefully sandboxed environments like browser Javascript). The scarier implication of this bug is for virtual machine servers, ie: most of the cloud.
posted by Nelson at 2:18 PM on January 3, 2018 [1 favorite]


this pretty much wipes out the one advantage that Intel had over the new AMD offerings (single core performance). At this point, I have a feeling that it's going to be hard to recommend Intel over AMD.

If you want low-power x86, Intel has low power variants available for mainline CPUs and Xeons, plus the embedded Xeon-D and Atom lines. AMD has very little that can compete with those.
posted by vibratory manner of working at 2:19 PM on January 3, 2018


Embargo is over.
posted by Slothrup at 2:26 PM on January 3, 2018 [11 favorites]


Another link.
posted by Slothrup at 2:27 PM on January 3, 2018 [3 favorites]


One thing I'm trying to guess about is how serious this bug is likely to be for normal home computers. On a typical computer running untrusted code of any sort of is already asking for trouble. It may in theory have lower privileges, etc but in practice every OS has so many security holes that if any unstrusted code is on your machine at all you're probably screwed. (Outside of very carefully sandboxed environments like browser Javascript).

Good news! (search for 34C3) The short story suggests that this bug might just be exploitable by Javascript. Buckle up, it's going to be a bumpy ride.
posted by Kyol at 2:28 PM on January 3, 2018 [1 favorite]


Yeah Google's going public: Today's CPU vulnerability: what you need to know:
The Project Zero researchers discovered three methods (variants) of attack, which are effective under different conditions. All three attack variants can allow a process with normal user privileges to perform unauthorized reads of memory data, which may contain sensitive information such as passwords, cryptographic key material, etc.

In order to improve performance, many CPUs may choose to speculatively execute instructions based on assumptions that are considered likely to be true. During speculative execution, the processor is verifying these assumptions; if they are valid, then the execution continues. If they are invalid, then the execution is unwound, and the correct execution path can be started based on the actual conditions. It is possible for this speculative execution to have side effects which are not restored when the CPU state is unwound, and can lead to information disclosure.

There is no single fix for all three attack variants; each requires protection independently. Many vendors have patches available for one or more of these attacks.
I'm getting the sense this is going to be far from the last set of vulnerabilities of this sort we'll see.

One interesting thing at first reading is that this really doesn't just seem to be Intel. For instance:
On the Android platform, exploitation has been shown to be difficult and limited on the majority of Android devices.

The Android 2018-01-05 Security Patch Level (SPL) includes mitigations reducing access to high precision timers that limit attacks on all known variants on ARM processors. These changes were released to Android partners in December 2017.

Future Android security updates will include additional mitigations. These changes are part of upstream Linux.
posted by zachlipton at 2:32 PM on January 3, 2018 [4 favorites]


Ah, and here's a forum thread of AWS users who noticed their performance went to shit after the kernel patches were applied to their hosts.
posted by zachlipton at 2:35 PM on January 3, 2018 [5 favorites]


Google going really, really public with a write-up of the vulnerabilites: Project Zero: Reading privileged memory with a side-channel . They say it can affect Intel, AMD, and ARM.
posted by zachlipton at 2:36 PM on January 3, 2018 [10 favorites]


- 2008/2009 models can only run up to Lion (10.7)

no, i have an imac from early 2008 that's running el capitan - it's even on your el capitan list
posted by pyramid termite at 2:54 PM on January 3, 2018


OK - my speculation above was wrong - no write access to kernel space here, but you can read everything
posted by mbo at 2:58 PM on January 3, 2018


I still don't understand whether the attack works on hypervisors. I'm sure we'll find out in the next few days.
posted by miyabo at 3:01 PM on January 3, 2018


Google has a proof of concept that could work in a KVM guest on a specific old version of Debian (in other words, they could read data from the host's kernel memory from inside a guest virtual machine). It's not clear whether that could be generalized to other platforms and how exploitable this really is, but it doesn't seem like hypervisors are at all safe.

The Times has a write-up, Researchers Discover Two Major Flaws in the World’s Computers:
Though the Meltdown flaw is specific to Intel, Spectre is a flaw in design approach that has been used by many processor manufacturers for decades and affects virtually all microprocessors on the market, including Intel-like chips made by AMD and the many chips based on designs from ARM in Britain.
In short, this is the beginning, not the end.
posted by zachlipton at 3:09 PM on January 3, 2018 [5 favorites]


Well, this is bad. From a quick look at the papers, it seems like the kernel patches mitigate one version of the attack fairly well (Meltdown), but aren't meant to mitigate another version (Spectre). In fact, this will be very difficult to mitigate. The proofs of concept use particular details of Intel CPUs, but modern CPUs weren't designed with these attacks in mind and it's going to take serious research to figure out how to design processors to avoid these problems.

Roughly, these attacks use speculative execution to cause a process to leak memory via a side-channel. This can allow one process to read the memory of another or for one piece of code to read memory that it shouldn't be able to. It seems like the proofs of concept so far mostly involve processes reading memory in the same process (for instance, JavaScript that can dump the memory of the browser), but the Spectre paper suggests that there are many code constructions that could make a program vulnerable to having its memory read by an attacker.
posted by ectabo at 3:45 PM on January 3, 2018 [1 favorite]


I would have preferred the branding “kneecap” over meltdown but Spectre is pretty ace.
posted by Annika Cicada at 3:56 PM on January 3, 2018


God dammit, 2018.
posted by MikeKD at 4:12 PM on January 3, 2018 [4 favorites]


"We evaluated Meltdown running in containers sharing a kernel, including Docker, LXC, and OpenVZ, and found that the attack can be mounted without any restrictions. Running Meltdown inside a container allows to leak in- formation not only from the underlying kernel, but also from all other containers running on the same physical host."

I'm glad that large multi-tenant cloud vendors are patching but the ramifications of this in hybrid and private cloud scenarios is frightening, especially as CICD increases velocity to thousands of releases a week in medium and large shops. The keys to the kingdom are handed over to developers as self-serve and they are wont to go grab public docker images with 100's of critical vulnerabilities and just shove them out the pipeline to production.

good lord.
posted by Annika Cicada at 5:00 PM on January 3, 2018 [10 favorites]


The CCC's proposal to use ancient 8-bit computers like the Apple II as a trusted computing platform (in this scenario for inspecting nuclear weapons) seems a lot less far-fetched today.
posted by RobotVoodooPower at 5:15 PM on January 3, 2018 [5 favorites]


Peter Bright at Ars Technica has a good article: “Meltdown” and “Spectre”: Every modern processor has unfixable security flaws
In the immediate term, it looks like most systems will shortly have patches for Meltdown. At least for Linux and Windows, these patches allow end-users to opt out if they would prefer. The most vulnerable users are probably cloud service providers; Meltdown and Spectre can both in principle be used to further attacks against hypervisors, making it easier for malicious user to break out of their virtual machines.

For typical desktop users, the risk is arguably less significant. While both Meltdown and Spectre can have value in expanding the scope of an existing flaw, neither one is sufficient on its own to, for example, break out of a Web browser.
posted by ectabo at 5:37 PM on January 3, 2018 [8 favorites]


Thank you zachlipton and Slothrup for linking to the readings. This is an interesting-as-fuck rabbit hole to read about on both high and low levels. I do not envy the kernel developers, admins, and others who have their work truly cut out for them.
posted by hexaflexagon at 6:22 PM on January 3, 2018 [1 favorite]


Mozilla Security: Mitigations landing for new class of timing attack
Several recently-published research articles have demonstrated a new class of timing attacks (Meltdown and Spectre) that work on modern CPUs. Our internal experiments confirm that it is possible to use similar techniques from Web content to read private information between different origins. The full extent of this class of attack is still under investigation and we are working with security researchers and other browser vendors to fully understand the threat and fixes. Since this new class of attacks involves measuring precise time intervals, as a partial, short-term, mitigation we are disabling or reducing the precision of several time sources in Firefox. This includes both explicit sources, like performance.now(), and implicit sources that allow building high-resolution timers, viz., SharedArrayBuffer.
posted by zachlipton at 7:00 PM on January 3, 2018 [6 favorites]


While both Meltdown and Spectre can have value in expanding the scope of an existing flaw, neither one is sufficient on its own to, for example, break out of a Web browser.

This is why everyone needs to stop putting SSH keys and other credentials in Git repos. Yeah you can’t break out of the browser object but a threat actor can sure as shit escalate privileges given what tech orgs store in Jira, Wiki and Git.

This has me rethinking collaborative web-based password managers for teams.

I’m pondering on if the iOS and MacOS password manager with the encryption chip is easily susceptible to these memory attacks, and my first pass gut check says no but I don’t know how the passwords are executed. I assume (hope?) the speculative processing by the CPU isn’t part of that data flow.

Maybe I’m completely out of my mind and way off base with my musings here?
posted by Annika Cicada at 7:14 PM on January 3, 2018 [1 favorite]


(I guess what I’m saying here about keys in git and whatnot is that you could execute some malicious JavaScript in one tab, say a CSRF on stack overflow that does this attack then reads the memory from your 37 other tabs you have open)
posted by Annika Cicada at 7:17 PM on January 3, 2018


Here's a good tldr; summary:

A Simple Explanation of the Difference Between Meltdown and Spectre

Spectre (omg I hate how every infosec bug needs name and branding nowadays) is the real tricky one, in my opinion, and is really about something fundamental with how modern CPUs are designed (and not an implementation issue). It seems almost impossible to fully fix.
posted by destrius at 7:26 PM on January 3, 2018 [2 favorites]


So... you're saying all that work I undertook to shut down servers more than 10 years old was a mistake?
posted by pwnguin at 7:39 PM on January 3, 2018 [1 favorite]


SPecial Executive for Counterintelligence, Terrorism, Revenge and Ecmascript?
posted by thelonius at 7:44 PM on January 3, 2018 [4 favorites]


Here's a C program that demonstrates the Spectre attack (I had to edit line 50 to get it to work on my compiler). It worked 99% of the time on my Macbook with Intel i7.
posted by RobotVoodooPower at 7:58 PM on January 3, 2018 [11 favorites]


Spectre is the real tricky one

Agreed. Meltdown is a very specific thing that KPTI can protect against, apparently. My guess is that the mass VM reboots are due to patching the hypervisors to have this protection. (Although I'm a little confused, from some sources it seems Xen had KPTI already.)

Spectre is going to be nearly impossible to fix. Because it relies on branches that are executing illegally anyway, you could have one VM guest directly attack another VM guest, skipping the host completely.

BPF support in the kernel makes Spectre a lot easier, although it should be possible even without it. It's a good idea to disable BPF even in VM guests now if you can. But it has to be really, completely removed from the kernel (turning it off via a runtime flag won't help -- again all the code is executing speculatively).

Assuming a microcode fix for Spectre is really impossible (and I'm not convinced about that, microcode can do a whole lot), then the only solution is replacing hardware in millions of cloud hosts.

It might be nice if someone implemented much, much more stringent checks on access to the BPF feature, and made it easy to permanently disable. But again, there are ways to get to this attack even without BPF, it's just a vector for speculatively executing arbitrary kernelspace code.

And yes I did spend all evening trying to understand this.
posted by miyabo at 7:58 PM on January 3, 2018 [7 favorites]


I’m pondering on if the iOS and MacOS password manager with the encryption chip is easily susceptible to these memory attacks, and my first pass gut check says no but I don’t know how the passwords are executed. I assume (hope?) the speculative processing by the CPU isn’t part of that data flow.

It would be more resilient to these attacks because they don't exactly give them away but Apple has had some pretty high profile duh-duh logic errors in macOS which has let shit go when it wasn't supposed to.
posted by Talez at 8:30 PM on January 3, 2018


I suspect that the thing that they can do is to hold off on updates to CPU hidden (but discoverable) state until something is no longer speculative, it means a side-cache of lines to be written to L1 that only get released when the things that brought them in become real, branch predictors that update later in the pipe etc etc
posted by mbo at 8:40 PM on January 3, 2018


But again, there are ways to get to this attack even without BPF, it's just a vector for speculatively executing arbitrary kernelspace code.

Yeah... basically what Project Zero came up with was a specific way they could exploit the leak, but other similarly smart minds can and will think of other kinds of code patterns that could be exploited, and some of these might be present in existing software, thus not requiring any kind of JIT. In any case, there's already a JIT at a very important trust boundary that will always be there, as Project Zero noted: JavaScript and the browser. I wager that's a more important trust boundary to violate nowadays than user/kernel.

Fundamentally the problem is that CPUs are cheating to get better performance (running code before they know if they can run it). A lot of the CPU performance gains we've gotten this past decade boil down to this cheat. And once you have a cheat (i.e. a policy violation), somebody is going to find some way to exploit it.
posted by destrius at 8:45 PM on January 3, 2018 [2 favorites]


In any case, there's already a JIT at a very important trust boundary that will always be there, as Project Zero noted: JavaScript and the browser.

Yeah; we're basically all running multitenancy machines. Our processes and data, plus potentially-hostile scripts that we invited into our browsers' sandboxes.

The Spectre paper contains a proof-of-concept of a JavaScript attack:
As a proof-of-concept, JavaScript code was written that, when run in the Google Chrome browser, allows JavaScript to read private memory from the process in which it runs.
Their code uses the SharedArrayBuffer trick to build a high-resolution timer; the Mozilla mitigations would deny that, but as the Fantastic Timers paper illustrates there's other ways a clever attacker can construct or recover high-resolution time.
posted by We had a deal, Kyle at 9:28 PM on January 3, 2018 [2 favorites]


Blocking BPF and JITs won't completely prevent the attack. Part of the Spectre paper discusses how to do the attack without a JIT --- the attacker needs to find a "gadget" in the victim program or in OS libraries that would leak some data if it were run speculatively and an indirect jump instruction that the victim program executes regularly. This sort of gadget may be very common -- for the example, they used eight bytes in ntdll.dll. The attacker tricks the CPU into predicting that the indirect jump will execute the gadget, and the next time the victim runs the indirect jump, it leaks some data.

The example also lets the attacker control the values of some registers when the gadget runs by having the victim program regularly read from a file, so there are limitations to this technique, but it's still very hard to defend against.
posted by ectabo at 9:34 PM on January 3, 2018 [1 favorite]


Joe Hills offers a metaphor to describe the situation.

Unfortunately it's a pretty inaccurate metaphor. Maybe better:

-A customer enters the cake shop and says he's not sure whether he wants a chocolate cake, or a fruit cake with a number of cherries equal to the shop's bank account number
-The helpful baker hears this and immediately starts making both cakes to save time
-The customer never actually orders the fruit cake, because if he tried the cashier would say "hey, you're not allowed to have that, it depends on our bank number!", but the fact that the baker made it and threw it away affects the performance of the kitchen, and then the customer orders a legitimate cake with cherries but the kitchen might be out of cherries and the customer is there with a stopwatch deducing how many boxes of cherries the baker brings up from the storeroom and therefore how many cherries they must have used up.

Um, sorry, that was the best I could do, but at least it captures that a) the security check is present so you can see why Intel thought things were ok, but it's applied after the bad speculation has had performance impacts that you can measure if you're careful and b) the speculation is directed instead of baking all possible cakes topped with credit cards and car keys. But this is still a cake shop that really needs to review their security procedures.
posted by Slogby at 12:11 AM on January 4, 2018 [11 favorites]


Yehuda Katz flagged a Chrome security document with a hell of a recommendation:
Don’t serve user-specific or sensitive content from URLs that attackers can predict or easily learn. Attackers can load such URLs in their attack pages (e.g. <img src=”https://email.example.com/inbox.json”/>) to get the sensitive information into the process rendering their page, and can then use out-of-bounds reads to discover the information. Use anti-CSRF tokens and SameSite cookies, or random URLs to mitigate this kind of attack.
I mean, if ordinary GET requests to user data without a CSRF token are really exploitable like this, there's essentially no same-origin policy anymore and holy crap that's basically the entire web.

(That there was a preliminary magnitude 4.6 earthquake as I finished typing this comment is surely a coincidence, but seems rather fitting to the size of this mess.)
posted by zachlipton at 2:40 AM on January 4, 2018 [8 favorites]


Intel was aware of the chip vulnerability when its CEO sold off $24 million in company stock

Intel says the stock sale was unrelated to the vulnerability, but came as part of a planned divestiture program. But Krzanich put that stock sale plan in place in October - several months after Intel was informed of the vulnerability.

Oops.
posted by petebest at 3:24 AM on January 4, 2018 [8 favorites]


My first thought when I read that yesterday was "I'm sure it was just regular stock selling that matches his regular pattern" and then I checked his Form 4 and he put through 800,000 shares where he usually drops less than 50K at a time.

If he wasn't insider trading on the knowledge he's damn sure making it look like he was. I'm sure his lawyers will have a very interesting list of questions from the SEC.
posted by Talez at 4:14 AM on January 4, 2018 [4 favorites]


I’m having a hard time buying the insider trading angle. It’s not like we can just buy AMD and be in the clear, here. This is literally the rug pulled out from underneath the entire modern processor industry.

Earlier I mentioned the VW diesel scandal because it’s very similar, we can’t so easily push the limits of Moore’s law into fucking magic in the same way VW couldn’t make diesel provide fucking magic in the face of the laws of thermodynamics.

Meltdown is bad. It’s patchable. Spectre is basically the entire chip industry being put on notice.

What’s an Intel shop gonna do? Go to AMD? There’s nowhere to go. This is detente.
posted by Annika Cicada at 5:27 AM on January 4, 2018 [4 favorites]


To clarify I’m using detente in the way that all the major chip manufacturers need to figure how to work together in order to avoid mutually assured destruction
posted by Annika Cicada at 5:32 AM on January 4, 2018


Not sure this will hurt Intel's stock price. After all, cloud providers will need tens of millions of brand new CPUs, and it's not like there are a whole lot of competitive vendors for them to go to in the short term.

Longer term is more complicated. Each of the major cloud providers is big enough that they could bid out for super-secure chip designs now.
posted by miyabo at 5:50 AM on January 4, 2018 [1 favorite]


Holy crap. I just finished a thorough reading of the Meltdown paper and I understand it now and it's fucking masterful.

It'd be pretty easy to fix with a redesign, just tag cache lines that have instructions still in flight with a speculative flag and ID (4-bits should be enough? 188 instructions in flight over ~8 ports?) and if the pipeline gets flushed, also flush any cache lines marked with the same speculative flag ID. As soon as you flush those cache lines they can't use a timing attack against them.
posted by Talez at 5:56 AM on January 4, 2018 [2 favorites]


Still making my way through the Spectre paper, but the point it makes - speculative execution is insecure by its nature in modern systems - is something I'll have to think through some more. Fundamentally, I can't see the difference between speculative execution and execution, so I'm still missing something. To some extent, I think that's the point.

How did we get here? All modern instruction sets and processor architectures (which are two angles on the same elephant) have evolved from what was in place in the mid 1980s when installed bases got big enough to shape the market. (ARM and x86 followed different paths at different times, but I think the basic model applies to both.) So competitive advantage and cashflow for the chip vendors came from speed-ups that didn't break the software base, and speculative execution is a good example of that. It shouldn't trigger security problems, but because it effectively makes the processor do things that exposes information to attackers (this is the bit I don't quite understand can't be avoided) it's not compatible with the way current architectures/instruction sets work.

It's interesting to explore the idea of a fundamental new design. That's been economically impossible for thirty years because of the x86 installed base, but this may not be true now. The big cloud companies may be in a position to fund and deploy a new ISA internally - I don't think there are any reasons stuff up the stack will care - and things like the Chromebook already exist where user experience is independent of the processor type. Once the ISA is established and the sunk costs covered, especially if there is pressure on businesses to switch to the new hardware by regulatory or best-practice security, then we may be onto something. And if the new ISA can efficiently and securely virtualise x86 without carrying across the low-level speculative vulns - a question so wormy it needs an entire canning factory - then the game gets more interesting.

(I'd say that any new ISA would need to be open and collaboratively designed, but there'd need to be a second worm cannery opened up for that)
posted by Devonian at 6:37 AM on January 4, 2018


Fundamentally, I can't see the difference between speculative execution and execution

The way I understand it is this:

Execution: "The program made the processor do a thing that it intended the processor to do (DO WHAT I SAY)"

Speculative Execution: "The processor is doing things with the program's memory that the program did not ask it to do"
posted by Annika Cicada at 6:50 AM on January 4, 2018


Moore’s law fallacy of hasty inductive generalization
ftfy
posted by thelonius at 6:56 AM on January 4, 2018 [3 favorites]


Fundamentally, I can't see the difference between speculative execution and execution, so I'm still missing something.

Speculative execution has its registers wound back if the pipeline is flushed (i.e. after an invalid memory access trap) but cache lines stick around after memory accesses. AMD insist their processors don't allow speculative execution to cross permission boundaries, Intel's for some reason does.

Meltdown basically loads an invalid address, it has further code which is executed specularly that goes off and pulls the data from the random address, calculates an address in user space using the contents of the memory address (i.e. let's say the memory byte has 0xC0 I can tell it to add 0x4000 to that), then tries to access that address (0x40C0).

Now once the invalid address trap is raised, all the registers are wound back. Everything looks the same, except there's a brand new cache line ready with the address that was accessed in user space. So I cycle through all of the possible addresses and time how long it takes, 0x4000, 0x4001, 0x4002, etc. I'm timing them, a few hundred ns each time, until I get to 0x40C0 and bing! It loads in like 30ns. So I've got a cache hit. That means that the byte that was in the invalid memory address is 0xC0! Success! I've read a byte from kernel memory!

Spectre on the other hand ties up the CPU for a few hundred cycles on a cache miss and then convinces other code to make memory accesses speculatively which then manifest themselves as cache lines in the user space and does the basically the same as Meltdown, except there's no security issues because to the CPU it looks like you're convincing a piece of kernel code to make the memory access for you. But that's not all. You can also do wonderful things like fuck with the stack in the course of all this speculative execution. In fact, as long as the data you're looking for is in the cache (so it can be accessed before the original cache miss resolves) you can get your hands on it. Which is why it works on AMD and ARM as well as Intel, because it looks legit to the processor. The only sign that the speculative execution has happened is a brand new cache line and that cache line holds the address which is actually data from another process. You then just need to, again, brute force it to get it out.
posted by Talez at 7:04 AM on January 4, 2018 [5 favorites]


Is it not possible for the kernel to just watch for some program groping the cache that way and just terminate it?

Maybe halt the system entirely? Or, in the context of a data center, the hypervisor halting the offending VM? 'SECFAULT?'

Is the game already lost with a single read? It seems like you'd have to do this over and over to make practical use of it and that should be detectable? Rather than just killing the entire pre-accessing scheme, if it's so important to performance.
posted by snuffleupagus at 7:10 AM on January 4, 2018


The nub for me is that speculative execution is stuff the program might well ask the processor to do, you just don't know it yet. If you have code that goes -

Is A bigger than B?
yes, then do this stuff
no, then do this other stuff

Speculative execution means the processor takes a guess at the answer to the question ahead of time, before it actually knows about A and B, and starts processing down the path it thinks is most likely. At some point in the future, it actually knows about A and B. if the guess was right, it's already done a lot of the work. If not, it junks that work and starts afresh on the correct path.

Both paths, right and wrong, should operate under the normal security rules set by how the process state relates to the CPU's general environment, through whatever internal processor data structures decide on what gets access to what and when. What makes the speculative execution peculiarly dangerous? Is it that the attacker knows to provoke it at just the right time to leave shit lying around in 'the open' at just the wrong time? I can see that as an implementation issue, absolutely, but not as a natural consequence of speculation.
posted by Devonian at 7:13 AM on January 4, 2018


Is it not possible for the kernel to just watch for some program groping the cache that way and just terminate it?

Or for that matter what about the MINIX in your chips thing? Is it possible to teach the CPU to watch for timing attacks internally? Or is it that you could obfuscate the pattern such that the probes look like normal operation? Or maybe that overhead would be worse?

In terms of chip design and stopping the gap before there can be any real new architecture, is this patchable in silicon? Some kind of on-die intermediator that watches what's being done with cache access?
posted by snuffleupagus at 7:15 AM on January 4, 2018


What makes the speculative execution peculiarly dangerous? Is it that the attacker knows to provoke it at just the right time to leave shit lying around in 'the open' at just the wrong time? I can see that as an implementation issue, absolutely, but not as a natural consequence of speculation.

If I understand the 'executive summary' correctly, the speculative execution fails to purge (or flag?) the caches it generates on a fork that turns out to be invalid and maybe comprises a privilege escalation (but was not audited/blocked because speculative and done in Ring 0 or whatever). Those caches are readable from userspace, and you can figure out which one is the one that was used through read timing comparison.
posted by snuffleupagus at 7:17 AM on January 4, 2018


Or for that matter what about the MINIX in your chips thing? Is it possible to teach the CPU to watch for timing attacks internally? Or is it that you could obfuscate the pattern such that the probes look like normal operation? Or maybe that overhead would be worse?

Overhead would be worse. Right now the mitigation appears to be unmap everything you possibly can from kernel space and remap it on a system call. Lessen the attack surface.

is this patchable in silicon?

Yes and no, maybe? It depends entirely on what performance issues would be seen in the out-of-order execution. If it was me? I'd have every speculative execution not rertired tagging its cache lines and both flushing those caches lines on a pipeline flush and making those cache lines miss while the pipeline hasn't retired the instructions. How much of a performance hit that would entail is above my knowledge.
posted by Talez at 7:21 AM on January 4, 2018 [2 favorites]


I had some AWS instances that got rebooted today, and they no longer successfully run the spectre.c test I posted above.

OTOH, servers at non-Amazon cloud providers are probably still vulnerable.
posted by RobotVoodooPower at 7:36 AM on January 4, 2018 [4 favorites]


What makes the speculative execution peculiarly dangerous?

The side channel timing attack is where the damage is done.
posted by Annika Cicada at 7:43 AM on January 4, 2018


Yes, I get the side channel stuff, but the thrust of the Spectre paper is that this is not something that can be designed against - or at least, so it seems. It's what makes the combination of speculation and side-channel extraction so toxic that still eludes me.
posted by Devonian at 7:51 AM on January 4, 2018 [1 favorite]


Also, due to the timing attack: This is why Mozilla (and no doubt all browsers and hell maybe even ECMA and the W3 consortium too) are right now looking at their high precision timers and figuring how to deal with those.

That's closing the gap for the most vulnerable exploitation point, the browser. But like, how do you deal with this inside C or Java, where high precision timers are required.

You have to think about this as step 4 of an attack.

Step 1: buffer overflow
Step 2: Reverse Shell with Outbound initiated socket
Step 3: Install rootkit
Step 4: Run Spectre Attack
Step 5: Exfiltrate

Before Step 4 used to be hard as hell to execute and hide and why the NSA buys good exploits for a million bucks a pop.

Now step 4 is...well, it's like a skeleton key.

The attack by itself is rather esoteric. But as it gets weaponized into the exploit chain the stakes will rise to levels we've never had to deal with before.

So as always, defense in depth helps but goddamnit we've never had a completely undetectable attack on memory on this kind of scale. It really is a mega game-changing event in the security field.
posted by Annika Cicada at 7:53 AM on January 4, 2018 [8 favorites]


I think it's even worse than that. We've never had serious memory attacks because no one outside Intel had a good understanding of their exact speculative execution algorithm and branch prediction strategy. Now we do. Even if hacks become available for Spectre and Meltdown specifically, there are probably many other related attacks that people can come up with using this new information.
posted by miyabo at 8:30 AM on January 4, 2018 [5 favorites]


yeah the trusted computing base should now be called the busted computing base.
posted by Annika Cicada at 9:44 AM on January 4, 2018 [5 favorites]


Maybe halt the system entirely? Or, in the context of a data center, the hypervisor halting the offending VM? 'SECFAULT?'

I don't see how these don't just become DoS attacks?
posted by MikeKD at 10:58 AM on January 4, 2018 [2 favorites]


In regards to patching Macs...According to AppleInsider, Apple has already patched 10.13.2, with something more coming for 10.13.3.

Note there is no discussion about patching any further back than 10.13, though.
posted by Thorzdad at 11:20 AM on January 4, 2018 [1 favorite]


I don't see how these don't just become DoS attacks?

I think any program you willingly run on your computer could do a pretty god DOS attack without any privilege escalation. Request a ton of memory, start a ton of threads, start churning through files on disk, do a bunch of cpu burning busy loops, launch a bunch of stuff on the GPU, etc. Usermode programs can wreak plenty of havoc.
posted by RustyBrooks at 11:26 AM on January 4, 2018


I don't see how these don't just become DoS attacks?

On the VM, I suppose, but not the host. If you can root a guest VM by other means, you can make it do things that will likely cause the host to shut it off. So that seems par for the course?

I'll be the first to admit I understand these things at the level of block diagrams and concepts, not actual data being pushed around the CPU and memory architecture.
posted by snuffleupagus at 12:15 PM on January 4, 2018


I would say those are fundamentally different from halting or SEGFAULTing the system because of some wonky cache behavior. And, at least with UNIX-like systems, there are ulimit and sysctl and in general, being able to notify the user of anomalous usage (ie, program unresponsive notices, etc.).

(ETA: in response to RustyBrooks)
posted by MikeKD at 12:17 PM on January 4, 2018 [1 favorite]


OTOH, servers at non-Amazon cloud providers are probably still vulnerable.

this is true.
posted by Annika Cicada at 12:56 PM on January 4, 2018


I had some AWS instances that got rebooted today, and they no longer successfully run the spectre.c test I posted above.

I believe you, but I don't understand how this is possible since it's a userspace-only test and the patches are all for userspace->kernel privilege escalation.
posted by miyabo at 1:15 PM on January 4, 2018


I'm not sure I understand the use of the attacks, although I understand the way speculative execution allows a process to access kernel memory. Is the timing issue because speculative execution won't let you write or store the result of the illegal access, and cache access timing is a way for the attacker to communicate its illegal knowledge to itself?
posted by Joe in Australia at 1:54 PM on January 4, 2018


Yes, that is a perfect description of Meltdown.
posted by miyabo at 2:00 PM on January 4, 2018


Hardware-wise, could this be solved in a relatively low-overhead way by either keeping the access bits from the page table entry in the TLB itself and not executing past that access if a protection fault were inevitable? I suppose you could also make the protection-ring-check-then-access operation atomic so that memory available to ring 0 were never accessible to speculative execution in ring 3, but I have no idea if that would prevent the cross-address-space snooping.
posted by invitapriore at 2:50 PM on January 4, 2018 [1 favorite]


I remember way back when thinking we'd all be running our software in VMs that were JIT'ed and optimized to enforce security policies, with no distinction between kernel and user mode. Though maybe if the performance hit is bad enough with these remedies, x86 transpilers will become popular.
posted by RobotVoodooPower at 3:13 PM on January 4, 2018


I remember way back when thinking we'd all be running our software in VMs that were JIT'ed and optimized to enforce security policies

The reality is a world full of Docker containers that haven't been refreshed in a year+. ( And here I was thinking that Containers would get me out of Perl dependency hell in hosting environments ).
posted by mikelieman at 3:33 PM on January 4, 2018 [1 favorite]


“Yeah perl is required as part of the application container build and netcat running as root is part of our platform management toolset”.

/me cries
posted by Annika Cicada at 3:47 PM on January 4, 2018 [5 favorites]




Amazon inked a deal to handle computation for the NSA recently, so maybe you could pay some dedicated machine rate like the NSA does. lol
posted by jeffburdges at 6:01 PM on January 4, 2018


It's worth noting that in the case of Spectre), the illegal access isn't across the ring3/ring0 boundary, but a standard security check (array bounds check). The CPU would have no idea if the conditional branch was security-sensitive or not, because it's operating at a lower level of abstraction. Essentially, you have to either flush the cache after every failed speculative execution, or you need some way for code to tell the CPU that "please do not speculatively execute the code in this branch". I think Intel is introducing a way to do the latter. The indirect branch poisoning attack (Spectre variant 2) is even more of a headache.
posted by destrius at 6:16 PM on January 4, 2018 [4 favorites]


It's worth noting that in the case of Spectre), the illegal access isn't across the ring3/ring0 boundary, but a standard security check (array bounds check). The CPU would have no idea if the conditional branch was security-sensitive or not, because it's operating at a lower level of abstraction.

Right, so this is the case where I hypothesize that caching the D/U/R bits from the actual page table entry in the TLB might offer some mitigation -- assuming that speculative execution weren't allowed to proceed until after a TLB hit against a line with the correct access bits set, which should prevent invalid accesses against pages that weren't mapped in the current userland address space, the exception would be triggered before the actual memory subsystem access and so the timing attack against the cache wouldn't be able to proceed. I am totally spitballing all of this, FWIW.
posted by invitapriore at 6:45 PM on January 4, 2018


“Yeah perl is required as part of the application container build and netcat running as root is part of our platform management toolset”.

Not to turn this into a Docker Worst Practices thread, but I have a something in my work environment that generates a Python virtualenv... to spin up a single container via docker-compose... which then generates a Python virtualenv...
posted by tobascodagama at 7:32 PM on January 4, 2018 [6 favorites]


To be fair, I have deferred to creating a Docker image for a Python dependency before, considering how little consideration it gives as a platform to the notion that you might want a relatively self-contained interpreter binary, but luckily if you want to avoid that path the Python interpreter source is at least well-behaved enough that you can do the traditional antic of just pointing make at a custom build directory and then tarballing the whole mess.
posted by invitapriore at 7:42 PM on January 4, 2018


Right, so this is the case where I hypothesize that caching the D/U/R bits from the actual page table entry in the TLB might offer some mitigation -- assuming that speculative execution weren't allowed to proceed until after a TLB hit against a line with the correct access bits set, which should prevent invalid accesses against pages that weren't mapped in the current userland address space, the exception would be triggered before the actual memory subsystem access and so the timing attack against the cache wouldn't be able to proceed. I am totally spitballing all of this, FWIW.

I've yet to read the paper in full detail, and my knowledge of lower level CPU internals is sketchy, but from what I understand, there aren't any actual memory access faults during the attack (beyond the need for a cache miss in the evaluation of the condition in order to make speculative execution happen while the value of the condition is yet to be determined).

1. The attacker asks to access offset BIG_OFFSET of an array A with size ARRAY_SIZE
2. The code checks whether BIG_OFFSET is larger than ARRAY_SIZE
3. While it's waiting for the result of that check, it continues on, and accesses A[BIG_OFFSET], which is the address to some (valid) memory the attacker is interested in. It then does some additional memory access making use of the value of A[BIG_OFFSET] (e.g. read SHARED_TABLE[A[BIG_OFFSET]]).
4. The result of the check returns and shows that BIG_OFFSET is larger than ARRAY_SIZE, so the CPU rolls back the execution, but SHARED_TABLE[A[BIG_OFFSET]] has already been accessed. The attacker can read from SHARED_TABLE as well, and so can use a timing attack to determine the value of A[BIG_OFFSET].

Of course in practice the access to SHARED_TABLE will be modulo the number of bits you want to read out. None of the accesses violate any page permission settings. Or am I missing something in what you're saying? This is a really complicated bug that probably requires me to re-read the paper multiple times.
posted by destrius at 10:09 PM on January 4, 2018


That's more or less correct destrius, except for step 0: the attacker asks to access offset SOME_OFFSET of an array A with size > SOME_OFFSET a few times to "train" the branch prediction that this is a test that would most benefit from speculatively executing as if the test will succeed.

It's a bloody smart thing to come up with, this attack, and I tip my hat at those figuring it out.
posted by DreamerFi at 11:23 PM on January 4, 2018 [3 favorites]


Apple Confirms 'Meltdown' and 'Spectre' Vulnerabilities Impact All Macs and iOS Devices, Some Fixes Already Released: Apple had patched against Meltdown in early December.

Earlier versions of MacOS are not mentioned in the support document linked from that article. However, El Capitan and Sierra were also patched on December 6 for undescribed problems related to "An application may be able to read kernel memory". The finders credited include the Google Project Zero group and the Graz University of Technology group behind this week's announcements of Spectre and Meltdown.

The doc says that Safari will be patched soon. Since any current version of Safari is usually supported on the two or three most recent versions of MacOS, El Cap and Sierra are likely to get those fixes as well. The fixes Safari receives are also unelaborated, but my bet is they're the same timing and data storage modifications that Firefox and Chrome will be getting in a week or two.
posted by ardgedee at 2:03 AM on January 5, 2018 [1 favorite]


Intel Issues Updates to Protect Systems from Security Exploits: "Intel has developed and is rapidly issuing updates for all types of Intel-based computer systems... that render those systems immune from both exploits (referred to as “Spectre” and “Meltdown”)..."

"Immune" is a strong word. Does anybody have a good take on what they're doing?
posted by ardgedee at 2:26 AM on January 5, 2018 [4 favorites]


I found the Hacker News discussion about the Intel article illuminating. That crowd has the pitchforks out for Intel right now, so take it with a grain of salt, but a lot of people feel like Intel is desperately trying to avoid bad PR any way they can. And that the patches do not make them "immune".

This stuff is all over my head but from what I've read the Intel patch makes it possible for the operating system to disable branch prediction entirely in the CPU. That would neuter this specific attack we know about at the cost of significant system performance. OTOH it's a CPU level fix. Another avenue for fixing the problem in Linux has been to add a compiler option so the compiler emits code that can never be executed speculatively. That's also really hideous.

My plan is to hide under a rock for a month or two and hope people smarter than me come up with a reasonable fix. TBH it's hard for me to tell how serious a threat this is in practice. That'll last right up until someone demonstrates a practical exploit. It sure sounds like the cloud VM providers have a lot more to worry about than desktop users.
posted by Nelson at 7:25 AM on January 5, 2018 [1 favorite]


So I have two questions so far;

What's the history of speculative execution security exploits, because did nobody else write about or predict (sorry) these as possibilities shortly after the advent of modern CPUs 10, 15 years ago?

There are early claims that Retpoline can help a lot against Spectre, but what/how does Retpoline do this?
posted by polymodus at 7:56 AM on January 5, 2018


retpoline is the weird code enabled by a new compiler option designed to fight these attacks. It emits code the CPU cannot execute speculatively so there's no possibility of information leakage. It runs more slowly. Some discussion I've read questions whether this works on the very newest Intel chips, they may outsmart it.
posted by Nelson at 8:06 AM on January 5, 2018


Yes, cloud tenants have more to worry about, but (hopefully) there's more than one person that's been working the incident response for the past few days. Thankfully there's a "dedicated instance" flag in AWS you can just flip if you can afford to buy your way out of the problem. (Ideally, those docker images are also automatically getting rebuilt daily or hourly on top of "latest" to pick up OS library updates, rather than stagnating. The userspace product code libraries (CPAN/Pip/rubygems) are versioned separately and internally.)

That's not to say there isn't a significant risk to users on desktops and laptops though. Specter (CVE-2017-5753 and CVE-2017-5715) included a javascript PoC in the whitepaper, so "code execution" in that case means an attacker only has to sneak malicious javascript onto an sketchy ad network somewhere, or get the victim to visit a malicious webpage.

A weaponized attack would not only break CORS in the browser, but allow an attacker to read the victim's cookies on all currently open websites (in other tabs or possibly even other browsers), as well as other information (including any XSS mitigations). The attacker could then use those to login as the victim to Gmail/victim's Bank/Facebook/Metafilter/whatever. I sure do hope my accountant's intern isn't in the habit of watching tv off sketchy streaming sites online during their lunch break.

Of course there's always disabling javascript, but (unfortunately) that's not really a workable solution for most users. It's anybody's guess as to how long it will be before this is actively exploited; hopefully nothing before Chrome 64 lands on the 23rd which will have additional mitigations. (Firefox's mitigation are only of limited help because then you have to use Firefox and their security record hasn't been great in the past.)
posted by fragmede at 10:33 AM on January 5, 2018


@polymodus: You may wanna look at Return Oriented Programming to get an idea of the genesis of this current threat vector.

The Spectre paper makes these references:

"This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim’s process."

"We also describe return-oriented-programming (ROP) and ‘gadgets’."

"Return-Oriented Programming (ROP) [33] is a technique for exploiting buffer overflow vulnerabilities. The technique works by chaining machine code snippets, called gadgets that are found in the code of the vulnerable victim. More specifically, the attacker first finds usable gadgets in the victim binary. She then uses a buffer overflow vulnerability to write a sequence of addresses of gadgets into the victim program stack. Each gadget performs some computation before executing a return instruction. The return instruction takes the return address from the stack, and because the attacker control this address, the return instruction effectively jumping into the next gadget in the chain."

And here's a 2008 paper on Return-Oriented Programming
posted by Annika Cicada at 10:35 AM on January 5, 2018 [5 favorites]


(also please note the use of "She" as the gender of threat actor, which I personally love and adore)
posted by Annika Cicada at 10:38 AM on January 5, 2018 [1 favorite]


There haven't been speculative execution security exploits before, as far as I know. No one outside Intel knew the exact details of how their branch prediction worked, so it while you could speculate (ahem) about the problem, you couldn't prove it existed. The big difference now is that the architecture details have been reverse engineered.
posted by miyabo at 10:51 AM on January 5, 2018 [1 favorite]


Why Raspberry Pi Isn't Vulnerable to Spectre or Meltdown is a great article that explains the vulnerability through some fairly simple examples.
posted by Nonsteroidal Anti-Inflammatory Drug at 10:55 AM on January 5, 2018 [2 favorites]


And I'm terribly amused by any suggestion that this will be fixed in a compiler - so great, the gcc you have in your (prod?) server won't emit code that uses branch predictions, but the gcc a malicious actor uses to create _perfectly portable_ x86 code certainly still will, so.. uh? Help me out here.
posted by Kyol at 10:56 AM on January 5, 2018


Kyol, I think the theory is programs compiled with the magic compiler won't have speculative execution and therefore won't leak their own state to the malicious actor. I don't like it as a solution, but it has a certain logic to it.
posted by Nelson at 11:02 AM on January 5, 2018


Could the "out of line construction" in the retpoline article be automatically patched into binaries at load time?
posted by We had a deal, Kyle at 11:21 AM on January 5, 2018


Of course in practice the access to SHARED_TABLE will be modulo the number of bits you want to read out. None of the accesses violate any page permission settings. Or am I missing something in what you're saying? This is a really complicated bug that probably requires me to re-read the paper multiple times.

Ah, nope, you're right, I was confusing the details of Meltdown and Spectre, which lets the kernel do all the dirty work for you. That's...a hard one.
posted by invitapriore at 11:45 AM on January 5, 2018


Firefox's mitigation are only of limited help because then you have to use Firefox and their security record hasn't been great in the past.
I don’t think’s true or helpful. I think most people in the industry would rank Chrome and Firefox towards the top. Both have a lot of smart engineers working on short and long-term improvements, and they’ve been instrumental in getting the entire field out of the era dominated by Adobe’s gross negligence and Microsoft’s desire to ensure that major enhancements only happen with major Windows releases.
posted by adamsc at 1:21 PM on January 5, 2018 [4 favorites]


And I'm terribly amused by any suggestion that this will be fixed in a compiler - so great, the gcc you have in your (prod?) server won't emit code that uses branch predictions, but the gcc a malicious actor uses to create _perfectly portable_ x86 code certainly still will, so.. uh? Help me out here.

The problem with Spectre is that the kernel is doing all the dirty work for you. If you recompile everything in the kernel to use only movs (x86 mov is Turing complete) nothing will ever execute speculatively because there's no branch to predict.

It would be slow as hell though.
posted by Talez at 3:05 PM on January 5, 2018 [2 favorites]


Spectre doesn't even apply exclusively to the kernel. On Linux anything running as root can read system memory. So even if the kernel is patched with Retpoline, you could have some untrusted code that does a speculative ROP attack against a daemon running as root, and uses it to read some random memory of a 3rd process. If you can figure out the physical page addresses of other VMs on the same hardware (this is hard), it may even be possible to do a speculative attack on that. Really it's a very, very nasty bug and the only complete fix will be in hardware.
posted by miyabo at 3:22 PM on January 5, 2018 [2 favorites]


Really it's a very, very nasty bug and the only complete fix will be in hardware.

QFMFT
posted by Annika Cicada at 3:30 PM on January 5, 2018


I think Rust and Servo are ample proof of Mozilla's commitment to security. It's believable FireFox will beat Chrome's security in several years, given Google's nature as an advertising company. I suppose FireFox using Adobe's EME plugin while Google rolls their own bodes ill for FireFox, but EME will stay off by default in FireFox and some distributions will strip EME, like Debian.

I'd think the JavaScript exploits could be mitigated by adapting the JIT dramatically. At worse abandoning JIT entirely for old skool interpretation, but probably simply emitting more elaborate instructions. Actually arguing you've caught all related JavaScript exploits sounds extremely hard though.
posted by jeffburdges at 4:03 PM on January 5, 2018 [1 favorite]


My understanding is that you could just run the cpuid instruction right before going into JS to kill any speculative branches in progress.
posted by miyabo at 4:54 PM on January 5, 2018 [1 favorite]


This is going to be one of those things that people talk about forever, huh?

May you live in interesting times. :|
posted by snuffleupagus at 6:13 PM on January 5, 2018


I'd think the JavaScript exploits could be mitigated by adapting the JIT dramatically. At worse abandoning JIT entirely for old skool interpretation, but probably simply emitting more elaborate instructions.
I think this is a big selling point for the JavaScript / WebAssembly model: a browser vendor can make changes – reducing timer precision, adding cache flushes or other mitigations, implementing retpoline-style code generation changes, etc. – and every user is immediately better off. That's not without its challenges but it seems better than waiting for every vendor to ship updated code and e.g. finding out that someone at Adobe blew the budget paying for promotional installs instead.
posted by adamsc at 6:56 PM on January 5, 2018




In hindsight, what is potentially the most pervasive and expensive security flaw in computing history should probably not have been described as "operating as designed".
posted by ardgedee at 4:45 AM on January 6, 2018 [3 favorites]


I keep imagining deeper layers of just how bad this is so I imagine new scenarios that scare the fucking crap out of me, run it by my peers to walk me back from the ledge but no, they take the new worse and make it more worse then ask me if nots that bad then I take their more worse and make it mega worse.

I can’t find the bottom and the only saving grace is “Spectre is hard”
posted by Annika Cicada at 7:15 AM on January 6, 2018 [2 favorites]


Yes but hard problems tend to become easy problems. That's how technological progress works, even when that progress is contrary to the general welfare.
posted by ardgedee at 8:32 AM on January 6, 2018 [1 favorite]


I guess I should have specified “saving grace” in my mind is “we have about 6 weeks before Spectre is usefully weaponized”
posted by Annika Cicada at 8:40 AM on January 6, 2018 [2 favorites]


I've kept wondering that "Spectre is hard" as a notion was why Intel as a corporation thought that they didn't need to deal with it; it is plausible to me that part of it was that people there 10-20 years ago rationalized it as, well, such a systemic class of concurrency vulnerabilities is possible but would so difficult to exploit given current technology, that it's not worth dealing with... Let's leave it to the researchers as a long-term theoretical issue. And then as a corporate business they forgot about it, but eventually, hackers armed with faster computers caught up.
posted by polymodus at 3:07 PM on January 6, 2018


Yup. And since we're in, essentially, a microprocessor monoculture, to the point where even non-Intel x86 and a few non-x86 architectures ape the way Intel does things, this attack basically brings down the whole pile of shaky cards.

Still. This is not Armageddon. Defense in Depth is a thing, and I would expect the big winners for this are edge security vendors, especially those with modern sandboxing systems and zero-day detection. For the uninitiated, the firewall, excuse me, Unified Threat Management Edge Security Appliance, looks at traffic coming into the secure network (even and especially encrypted traffic, SSL Intercept is now "du rigour," an agreed-to man-in-the-middle attack where the firewall decrypts, inspects and then re-encrypts traffic for hosts on the protected network.)

If it sees anything resembling an executable or anything resembling a REST-ful command sequence, it will:

1) Match it against known malware signatures, and not exactly - if it even smells like a known attack, BLOCKED until you ask your firewall admin I mean Edge Security Analyst to whitelist it.
2) Send it to a special server that pretends to be the type of system the traffic was intended for, and if it starts to do unexpected things (like a Specter or Meltdown attack), it's nuked and added to the known list of malware signatures, and this is communicated upstream to the vendor who pushes it out to everyone else. This is a sandbox appliance, and it integrates into the rest of your expensive firewall I mean UTM infrastructure. These things are surprisingly quick at setting up a custom honeypot. They also work with specialized appliances like hardened mail hosts, load balancers and application-specific firewalls from the same vendor.

Also, application-aware edge network security appliances like Impervas and Fortiwebs already do a great job of fingerprinting typical traffic and quashing and reporting strange stuff at the border. Vendors are already ahead of the notion that east-west (hosts on the same network) is as big a threat vector as north-south (hosts on other networks).

The downside is that this stuff is expensive, and the people who know how to run it and to create and analyze threat reports from the logs are more expensive still. There's not really a good open-source alternative for a Fortinet or Palo Alto or FireEye ecosystem.

Also, everything is chucked out the window if you're relying on a security cloud-appliance running on a VM with an Intel/AMD/ARM hypervisor. (Red Hat reports IBM's POWER processors are affected by Specter but not Meltdown, probably due to their recent, weird double-endian shenanigans. No word on Z-series, SPARC or MIPS. Specter sort of works on SPARC if you spend a shedload of dev time custom-tuning it, but even this is rumor.)

Linus and Theo and Mozilla need to drive their resources to build automated zero-day edge network defenses soonest, and Amazon needs to write huge checks to make it happen.
posted by Slap*Happy at 5:28 PM on January 6, 2018 [3 favorites]


Linus and Theo and Mozilla need to drive their resources to build automated zero-day edge network defenses soonest, and Amazon needs to write huge checks to make it happen.


The people whose job it is to look at the big picture of data security should spend more time talking to immunologists, because biology is where you find an awful lot of very smart defences against very smart attacks on complex and fallible information systems. Only I've yet to find who's looking at big-picture data security with any level of abstraction, let alone any areas where immunology and data security professionals are talking. We got as far as the virus and worm analogies, and it seems to have stopped there.
posted by Devonian at 6:48 PM on January 6, 2018 [4 favorites]


Cylance is probably the closest to mimicking biology.

I’ve long thought about this. Where I work is doing some pretty cool stuff along these lines, but we’re a mid-size dotcom not google. But yeah we’re working on it.

As far as Palo Alto goes...uhh...well Palo Alto’s are great but there is class of Java buffer overflow attacks that IPS simply do not catch yet (we are working directly with vendors to sort it out) and fire Eye endpoint is pretty cool provided you know what you’re trying to block but as far as sandboxing goes are you really gonna sandbox every possible CSRF in a comment field coming across a GET? I get sandboxing files but web browsing?

Imperva is pretty good too I use it but I think their incapsula product is better overall. Also neither incapsula or imperva are able to catch the class of Java buffer overflow attacks I’m speaking of in the previous paragraph. (Hey talk about something to weaponize with Spectre!)

I wish I could talk more in detail specifically how this changes the defense in depth strategy where I work. (Who might be the person who is responsible for leading that effort where she works...yeah I can’t just be blabbing about shit y’all.)

Suffice to say if it’s a security tool we probably have it and we’re actively investigating how our tools can detect a Spectre attack. Furthermore if we don’t have a certain tool it’s because it doesn’t fit our risk-based security program or just downright fails to protect shit in a real world scenario.
posted by Annika Cicada at 8:36 PM on January 6, 2018 [1 favorite]


Intel's getting slammed, nobody seems to have noticed that AMD released a microcode update that basically disables branch prediction entirely.

This is like using a nuke to blow up a fly. The performance hit is going to be fucking nuts.
posted by Talez at 8:47 PM on January 6, 2018 [4 favorites]


So as always, defense in depth helps but goddamnit we've never had a completely undetectable attack on memory on this kind of scale. It really is a mega game-changing event in the security field.

Given the inherent limits on the rate of exfiltration since you only get at most a bit or two per try, I'd say that undetectable is going a bit too far. Detecting a possible attack is easy, detecting it before they get sensitive data like your TLS private key is much less doable without architectural changes, though address randomization helps here, of course. You can at least reduce the chance of a successful attack with automated monitoring tools killing suspicious processes.

I think flushing cache lines on a mispredict is the most likely first round of mitigation from CPU vendors since it's relatively easy to construct code that isn't likely to fool modern branch prediction units. That should prevent any opportunity for exfiltration of data.
posted by wierdo at 10:42 PM on January 6, 2018


Given the inherent limits on the rate of exfiltration since you only get at most a bit or two per try, I'd say that undetectable is going a bit too far. Detecting a possible attack is easy, detecting it before they get sensitive data like your TLS private key is much less doable without architectural changes, though address randomization helps here, of course. You can at least reduce the chance of a successful attack with automated monitoring tools killing suspicious processes.

It's actually a byte per try. The problem is that this is really fast when done in a tight loop. Like half a meg a second. While it looks like your browser is mildly crapping itself which is how far from normal? So you pull down the kernel structures, you can walk them to find exactly what you're looking for and pull down the data. You don't even have to read all of memory. You bypass any sort of ASLR because you literally go straight for the index.

Not to mention even if you start doing things like noticing sequential access across page boundaries, it's not exactly hard to disguise that behaviour. Then you're just starting an arms race. You cannot be that glib when dealing with people with effectively infinite amounts of time and ingenuity.

I think flushing cache lines on a mispredict is the most likely first round of mitigation from CPU vendors since it's relatively easy to construct code that isn't likely to fool modern branch prediction units. That should prevent any opportunity for exfiltration of data.

That would be a massive hit to performance. Even with 90-something% of branches predicted correctly you're still looking out blowing out a pipeline flush from ~0-25 clocks to thousands of clocks as the pipeline stalls waiting for cache lines to be loaded back in. That's no bueno. The only way to mitigate this with acceptable performance is to start doing something like tagging speculative cache lines either in the cache or at the reservation station and flushing those lines along with registers on a rollback of speculative execution. Or they need to implement a shadow cache for speculative execution and once the ROB retires the instruction it also tells the shadow cache to retire the associated cache lines as well.
posted by Talez at 11:53 PM on January 6, 2018 [1 favorite]




but as far as sandboxing goes are you really gonna sandbox every possible CSRF in a comment field coming across a GET? I get sandboxing files but web browsing?

Fingerprinting typical usage, and then sandboxing atypical usage. App-aware appliances are good at filtering out atypical requests at near linespeed. Little Bobby Tables is an understood threat - not that particular attack, but unsanitary GETs in general. (I may be misunderstanding, I'm coming at this from a server-centric edge-network grognard perspective. I can see how javascript can obfuscate the hell out of everything, tho.)

Don't get me wrong. This is going to cause untold damage. It's just not the immediate death of the internet.
posted by Slap*Happy at 9:26 PM on January 7, 2018 [1 favorite]


Completely naively, a shadow cache sounds like it would be more performant compared to tagging speculative execution accesses in the regular cache itself. What I don’t fully understand is how speculative execution interacts with cache coherency in multicore systems. It sounds like a goddamn nightmare, honestly.
posted by invitapriore at 9:53 PM on January 7, 2018


> Firefox's mitigation are only of limited help because then you have to use Firefox and their security record hasn't been great in the past.

> I don’t think’s true or helpful.


There's probably a less incendiary, more helpful fashion I could have stated that, but the latter half of my statement is true. I believe rewriting Firefox Quantum in Rust is huge, and will pay off in various areas, and I eagerly anticipate the day when I can resume using Firefox for day-to-day web browsing while also being a computer-security-aware person.

However, Pwn2Own didn't even bother allowing attacks against Firefox in 2016 because it was considered "too easy" should tell you something (FF got pwned in 2017). Two years ago isn't so long ago as to forgotten or irrelevant given how much people push off performing computer updates (of all flavors). (CVE-2014-0160 aka Heartbleed was "only" 4 years ago.)

In the two/three months since Firefox Quantum was released in November last year, they've already had a gaff where they accidentally auto-installed a sketchy sounding "Looking Glass" spyware-sounding extension to (some) users automatically but turned out just to be an advertisement for the hacker-tv show Mr. Robot. This blunder isn't directly related to the security of the Firefox core rendering engine, Servo, it demonstrates that Mozilla, at some level, doesn't really 'get it'. :(

Back to the topic of Meltdown and Spectre, in the server space this isn't the death of the Internet, but move to the Cloud made this more of an issue than it would have otherwise.

Ignoring what you think about cryptocurrenty, Coinbase's writeup represents a whole lotta work that happened , but my compatriots who work at a companies who still host their own servers and decided against moving to the Cloud, read the announcement and collectively shrugged.

It's not the immediate death of the Internet, but only because there's a huge swath of people who have made protecting the Internet from digital attack their life's work. It's like Y2k for those that remember the resulting lack of catastrophe on January 1st, 2000 - a temporary industry sprang up to make sure nothing happened, so there was a whole lot of nothing thanks to that hard work.

(Personally, I think six weeks to weaponization is overly optimistic given the highly directly lucrative payoff to drive-by stealing of anything bitcoin related. 15 days until Chrome 23 drops.)
posted by fragmede at 8:51 AM on January 8, 2018


Oh good, there are grub command line options and /sys flags for disabling the upcoming Linux kernel patches - my client has gone from "hey guys, got a plan?" to "maybe we don't want to take the update due to performance concerns", which would've thrown a wrench in any, y'know, future patching whatsoever.

I forget if access.redhat.com is paywalled, google for "pti" "ibrs" and "ibpb" if it is.
posted by Kyol at 10:56 AM on January 9, 2018


I forget if access.redhat.com is paywalled

There was some discussion in LWN.net comments, apparently redhat were contacted and have made that specific article public.
posted by sourcejedi at 11:06 AM on January 9, 2018




Kyol: If you have pcid available then the impact is apparently fairly minimal (hardware wise that’s everything since Westmere, i.e. since around 2010sh, so odds are that you do.). That requires a recent kernel however & if you’re running on a virtual machine, doing so on top of a hypervisor that exposes it to the client.
posted by pharm at 12:47 PM on January 9, 2018


Yeah, I think some of the initial "THIRTY PERCENNNNT!!!1!!!!1!!1!!" dire predictions have my client worried, despite the subsequent claims to the contrary. But having live, real time calls into /sys to revert the changes makes patching a lot more palatable, and the tuned hooks means it ties into our existing automation so if it turns out that we need to disable one remediation for marklogic and a different one for hadoop? M'kay, we're good.
posted by Kyol at 1:18 PM on January 9, 2018


The only >30% performance hits I'd heard of were on tests designed to maximize use of the patched code (I'd cite but can't find the link right now). Real-world impact seems unlikely to reflect the worst-case scenarios formulated in laboratories.
posted by ardgedee at 1:35 PM on January 9, 2018


Or like the postgres benchmark which fakes all network calls to loopback, suffering all the overhead of a context switch with none of the normal inter-switch delay while the packet goes off to a client and back, I gather? But yeah, lab tests are not indicative of real world experience so far as I've heard.
posted by Kyol at 2:08 PM on January 9, 2018 [1 favorite]




Triple Meltdown: How So Many Researchers Found a 20-Year-Old Chip Flaw At the Same Time (Andy Greenberg for Wired, Jan. 7, 2018) -- a look at how four different groups discovered old bugs around the same time.
posted by filthy light thief at 2:56 PM on January 10, 2018


I found it when I was railing at the demise of everything non-x86 in the early aughts.

I mean, really! Alpha. MIPs. PowerPC and POWER, PA-RISC, Fujistu's flavor of SPARC, they all of a sudden didn't start to suck compared to x86, decisions were made. Intel chips, Microsoft OS, get on board, we will aggressively defund R&D not on board.

Am I the only one who remembers Linux on commodity Alpha desktops? Am I?

You all wanted this. You all wanted a monoculture, tended to by a closed-source titan, and pretended it was an "open architecture." Guess what?

Linus, this upended glass is aimed in your direction, the internet has pretty much forgotten your "Intel is the universal standard" quote, but I have not. This is the world you and Bill G made, by hook and crook. N'joy your minty fate.

Also, I really like how OpenBSD learned about Meltdown and Specter from a layman Ars Technica article. Gotta sell them Red Hat licenses, I guess.
posted by Slap*Happy at 7:53 PM on January 10, 2018 [3 favorites]


Am I missing something? Later Alpha and Power implementations definitely make use of branch prediction/speculative execution, and probably MIPS too, which implies a high likelihood of susceptibility to side-channel attacks like this.
posted by invitapriore at 8:08 PM on January 10, 2018


Also, I really like how OpenBSD learned about Meltdown and Specter from a layman Ars Technica article.

They've proven that they can't be trusted with either abiding by embargos or utilizing common sense in their disclosure discretion and they don't sign NDAs. They dug their own graves.
posted by Talez at 8:21 PM on January 10, 2018 [1 favorite]


A half dozen different varieties of speculative execution, innovating on Moore's schedule. I mean, not even all of the ARM-a-verse is susceptible, and POWER likely is only on the hit-list because they needed to get religion in little-endian supremacy and sort-of Intellianisms to host VMs anticipating and executing on such, despite breathing fire and fury as a big-endian arch. The "open source" compilers just stopped seeing things their way.

If we had a diverse and competitive hardware ecosystem, with a fair and impartial FOSS toolchain, but if wishes were fishes then beggars would ride.
posted by Slap*Happy at 8:22 PM on January 10, 2018


They've proven that they can't be trusted with either abiding by embargos or utilizing common sense in their disclosure discretion

They can be trusted when deciding when they're being screwed with, and releasing fixes alongside disclosures on originally agreed timelines, when there are actual zero-days from malicious actors in the field.

But, yes, they cannot be trusted when moving the goalposts all around the field, the fiends.
posted by Slap*Happy at 8:38 PM on January 10, 2018


if wishes were fishes then beggars would ride

"Horses," surely.
posted by Chrysostom at 9:05 PM on January 10, 2018


They can be trusted when deciding when they're being screwed with, and releasing fixes alongside disclosures on originally agreed timelines, when there are actual zero-days from malicious actors in the field.

And yet it's only OBSD that we have these problems with along with staff that actively despise security embargos. "Fuck you all we have to defend our users" is great if you're one of the dozen OBSD users I guess.
posted by Talez at 9:48 PM on January 10, 2018


Let's face it, if they let TdR know about Meltdown back in April they would have patched it and released it as fast as they could. They got so shitty about KRACK, basically a Sunday picnic compared to Meltdown, that they begged a security researcher to release it even when it was obvious other vendors needed more time. I can't see them sitting on Meltdown whilst everyone completely reengineers their kernel code.
posted by Talez at 9:53 PM on January 10, 2018


I'm just intuitively thinking that this is just a huge vindication for Richard Stallman, as he's been a long advocate of emancipated hardware, not just software. I was kinda hoping he'd say or write something about this.
posted by polymodus at 1:55 AM on January 11, 2018 [1 favorite]


I don't think pining for a glorious rainforest of ecodiversity in processor architectures is realistic or fruitful. The reason for speculative execution is because that's how you speed up sequential von Neumannesque processing in a world where memory access is the limiting factor.

Which it always, always is. That's not an Intel architectural issue. That's physics. It doesn't much matter what the instruction set, register map or what have you is, which in any case is vastly different on the inside to the ISA that programmers see. You have fast things and you have slow things, and you try not to let the slow things dictate the speed of the fast things.

We have tried at various points to build asynchronous processors - spec ex is a tiny step in that direction - but they don't work even if you don't worry about inter-process protection. Perhaps there is a safe, efficient, asynchronous processor concept out there, but nobody's found it yet.

You have a choice. Either you keep all your process metadata in fast registers at all times, which puts extreme bounds on what your software can do, or you have the flexibility of having that stuff outside the CPU core where it can scale, and you take the hit of stopping work on a process when it does something outside its local context. Otherwise you take on the responsibility of maintaining data integrity in your own little multiverse, and you'd better keep those little universes apart or you'll fuck the poodle of causality.

Perhaps Intel is at fault for taking market advantage of an insecure concept, but given that it's taken twenty years to break the surface I feel that would be a hard case to make stick.Did it know about it subsequently but was too scared to do anything? Different question. Did anyone else?
posted by Devonian at 2:28 AM on January 11, 2018 [6 favorites]


I was just discussing this with a old (in every sense) friend, who has been terribly amused by all this. He says he brought up the issues of look-ahead execution and memory security on parallel processes when he was working at Elliott Automation in the mid 1960s, and I absolutely believe him.

(Elliott Automation was one of the near-forgotten UK computer companies, and got involved in the field in the 1940s when the Admiralty looked at the performance of fire control systems and radar in the Royal Navy in WWII and decided the answer was an all-digital integrated system. Fun then ensued...)
posted by Devonian at 6:47 AM on January 11, 2018 [3 favorites]


OK. In order.

Let's face it, if they let TdR know about Meltdown back in April they would have patched it and released it as fast as they could.

And given the long lead-in for Meltdown and Specter, Microsoft still bungled it, and needs more time they don't have. Vendors either have the ability to mitigate in a timely manner, or they do not. Giving those who do not more time, while actual damage is being done with the undisclosed vuln, is a fool's errand. More, the OpenBSD patch "back in April" would be cleanly written, carefully commented, and unencumbered by license for everyone else to copy.

I don't think pining for a glorious rainforest of ecodiversity in processor architectures is realistic or fruitful.

Well, here's the deal, tho. We still have the ecodiversity. It never went away, only the options available for the average user went away. There are a lot of processor families, and sub-families (not all ARM arches are vuln) that simply aren't on the hit-list because it's too damn hard to go after their particular variety of speculative execution in any kind of performant way.

But, still, if wishes were horses, fishes would beggar. I'm a disgraced Unix admin who's now flailed his way into edge network stuff, and after Wannacry, I can confidently claim that we are so not ready for this shit.
posted by Slap*Happy at 7:07 PM on January 11, 2018 [1 favorite]


« Older 1123 Miles from Bloom County   |   "Kate Bush is more like Keats." Newer »


This thread has been archived and is closed to new comments