It's Friday, so let's all relax with some CPU design theory
August 14, 2015 1:42 PM Subscribe
Raymond Chen breaks down the itanium processor in a 10 11 part series on his weblog.
- The Itanium processor, part 1: Warming up
- The Itanium processor, part 2: Instruction encoding, templates, and stops
- The Itanium processor, part 3: The Windows calling convention, how parameters are passed
- The Itanium processor, part 4: The Windows calling convention, leaf functions
- The Itanium processor, part 3b: How does spilling actually work?
- The Itanium processor, part 5: The GP register, calling functions, and function pointers
- The Itanium processor, part 6: Calculating conditionals
- The Itanium processor, part 7: Speculative loads
- The Itanium processor, part 8: Advanced loads
- The Itanium processor, part 9: Counted loops and loop pipelining
- The Itanium processor, part 10: Register rotation
This is a rather itanionormative post.
posted by lkc at 2:01 PM on August 14, 2015 [2 favorites]
posted by lkc at 2:01 PM on August 14, 2015 [2 favorites]
This is going to take me a while to read through, but thanks for posting it. This is exactly the kind of stuff that makes me love the internet again after being convinced it's been ruined.
posted by primethyme at 2:13 PM on August 14, 2015 [5 favorites]
posted by primethyme at 2:13 PM on August 14, 2015 [5 favorites]
I've been following along with these and they're fascinating. Such an interesting architecture: the register windowing reminds me strongly of SPARC.
Reading these, it's very clear that Itanium was a huge failed gamble on compiler tooling: push all the pipelining decisions off to the compiler, and hope that the compilers catch up and are able to generate efficient code for it.
posted by We had a deal, Kyle at 2:13 PM on August 14, 2015 [2 favorites]
Reading these, it's very clear that Itanium was a huge failed gamble on compiler tooling: push all the pipelining decisions off to the compiler, and hope that the compilers catch up and are able to generate efficient code for it.
posted by We had a deal, Kyle at 2:13 PM on August 14, 2015 [2 favorites]
I used to use a compute farm that got a fancy new IA-64 server, which they called
IA-64 still seems incredibly cool and I'm sad that it didn't catch on.
posted by grouse at 2:14 PM on August 14, 2015 [7 favorites]
olympic
. I thought this was a pretty cool name for a beefy server. Months later I found out a similar server assigned to another group had the name britannic
. Only then did I realize this naming scheme was a reference to the Olympic-class ocean liners, and especially the third, and most famous, ship of that class.IA-64 still seems incredibly cool and I'm sad that it didn't catch on.
posted by grouse at 2:14 PM on August 14, 2015 [7 favorites]
In general, the Itanium was really the wrong direction in terms of instruction size and portability to newer versions (e.g. what happens when a new Itanium adds more units/ports but your code is compiled for an older version?).
But the NaT bit, tentative load instruction and the multiple condition flag registers (neat!) are at least three things someone should have exposed to the compiler in later CPU designs.
posted by smidgen at 2:24 PM on August 14, 2015 [2 favorites]
But the NaT bit, tentative load instruction and the multiple condition flag registers (neat!) are at least three things someone should have exposed to the compiler in later CPU designs.
posted by smidgen at 2:24 PM on August 14, 2015 [2 favorites]
In general, the Itanium was really the wrong direction in terms of instruction size and portability to newer versions (e.g. what happens when a new Itanium adds more units/ports but your code is compiled for an older version?).
This is an Intel design you realize. From the people who gave us several generations of 386/486/Pentium/Core processors that added new instructions in each successive generation which required exactly what you describe.
posted by GuyZero at 2:29 PM on August 14, 2015
This is an Intel design you realize. From the people who gave us several generations of 386/486/Pentium/Core processors that added new instructions in each successive generation which required exactly what you describe.
posted by GuyZero at 2:29 PM on August 14, 2015
I am curious about CPU design in the post-LLVM era. The sort of intelligence you'd want in compilers is actually implementable now in a way it wasn't before.
I'm also curious if we'll get more efficient security features. Sure would be nice to have guard bytes rather than guard pages.
posted by effugas at 2:31 PM on August 14, 2015 [2 favorites]
I'm also curious if we'll get more efficient security features. Sure would be nice to have guard bytes rather than guard pages.
posted by effugas at 2:31 PM on August 14, 2015 [2 favorites]
I just like saying Itanium.
Itanium Itanium Itanium.
posted by selfnoise at 2:37 PM on August 14, 2015 [2 favorites]
Itanium Itanium Itanium.
posted by selfnoise at 2:37 PM on August 14, 2015 [2 favorites]
In theory, an Itanium with more units would just execute more of the instructions bundle in a group simultaneously.
The design was actually very forward thinking, they just ran into the problem that their own IA-32 processors outperformed IA-64 by the time it finally reached the market and meanwhile AMD introduced AMD64 which also outperformed IA-64 and had a very obvious upgrade path for existing applications and operating systems. (Plus, for a very brief moment, Opterons managed to outperform Pentium 4s and Intel was forced to react to the market realities.)
posted by nmiell at 2:38 PM on August 14, 2015 [2 favorites]
The design was actually very forward thinking, they just ran into the problem that their own IA-32 processors outperformed IA-64 by the time it finally reached the market and meanwhile AMD introduced AMD64 which also outperformed IA-64 and had a very obvious upgrade path for existing applications and operating systems. (Plus, for a very brief moment, Opterons managed to outperform Pentium 4s and Intel was forced to react to the market realities.)
posted by nmiell at 2:38 PM on August 14, 2015 [2 favorites]
That's a great blog. Raymond Chen does a good job of explaining "why the hell does Win32/Explorer/DOS/etc. do this insane thing so insanely" and the twist ending is usually that there were smart people making what were, at the time, smart decisions.
(even if the reasoning was "because your CEO says so")
posted by kurumi at 2:44 PM on August 14, 2015 [4 favorites]
(even if the reasoning was "because your CEO says so")
posted by kurumi at 2:44 PM on August 14, 2015 [4 favorites]
I don't think Itanium was a hardware design failure so much as a marketing failure. Intel (and HP) went to great lengths to build a shining new architecture up on a hill, and decided for the privilege of getting to use it they could command a far greater price and exclusivity, and then pay through the nose for a matching optimizing compiler.
Then AMD shot back with "hey dudes! we bolted another 32-bits to the end of all these 32-bit registers, and you can keep coding in x86 just like before". And AMD won (for a few more years).
At the same time companies like google leveraged a mass of cheap x86 hardware where once a big chunk of iron from Silicon Valley was needed.
posted by nickggully at 2:47 PM on August 14, 2015 [5 favorites]
Then AMD shot back with "hey dudes! we bolted another 32-bits to the end of all these 32-bit registers, and you can keep coding in x86 just like before". And AMD won (for a few more years).
At the same time companies like google leveraged a mass of cheap x86 hardware where once a big chunk of iron from Silicon Valley was needed.
posted by nickggully at 2:47 PM on August 14, 2015 [5 favorites]
Ha! Came in for the "now do one for a platform somebody actually uses" mockery, was not disappointed.
Along those lines - I guess he's worn out from the Itanium series but this post links to another MSDN blog taking on ARM. Also he (Raymond) has a series on calling conventions, starting here, which is kind of similar and spans a bunch of architectures.
posted by Joey Buttafoucault at 2:47 PM on August 14, 2015 [1 favorite]
Along those lines - I guess he's worn out from the Itanium series but this post links to another MSDN blog taking on ARM. Also he (Raymond) has a series on calling conventions, starting here, which is kind of similar and spans a bunch of architectures.
posted by Joey Buttafoucault at 2:47 PM on August 14, 2015 [1 favorite]
I would actually argue Itanium was a success. Sure, they shipped low numbers, but at insane profit margins -- I've talked to IT people at companies with $100M of Itanium gear. They used the profits to develop the best fab technology in the world, which they turned around and used for x86 chips. They completely destroyed Sun, which in the early 2000s was looking like a serious Intel competitor, and seriously hobbled IBM's high-end server business. AMD clearly "won" on technical merit but they turned their most important product into a low-margin replaceable commodity, and went bankrupt.
posted by miyabo at 3:08 PM on August 14, 2015 [3 favorites]
posted by miyabo at 3:08 PM on August 14, 2015 [3 favorites]
Only then did I realize this naming scheme was a reference to the Olympic-class ocean liners, and especially the third, and most famous, ship of that class.
I wonder if that's a reference to Itanium being called "Itanic" by The Register, even before the first chips were released. They had a pretty good feeling it wasn't going to go well for Intel.
posted by zsazsa at 3:15 PM on August 14, 2015 [3 favorites]
I wonder if that's a reference to Itanium being called "Itanic" by The Register, even before the first chips were released. They had a pretty good feeling it wasn't going to go well for Intel.
posted by zsazsa at 3:15 PM on August 14, 2015 [3 favorites]
The big problem with Itanium was having to run x86 code.
If you complied and ran ia64 code, it was a pretty solid CPU in the first iterations and a top flight one in the second iteration. But x86 *won*, and x86 performance was horrible, and nobody cares about ia86 performance at first so the word got out that the Itanium's performance was horrible.
No, it was horrible at x86 code. Not surprising. It was good with properly compiled ia64 code. But nobody was really testing that. The popular press certainly wasn't, their benchmarks weren't written in ia64!
And to properly compile for the thing, you needed a proper complier, and for a long time, only Intel had one. And you paid for it. Then GCC claimed to target it, but GCC produced *HORRIBLE* code for it, thus making the performance legend even worse. Then again, if you wanted performance, GCC was *never* a good complier. Well, really, the only reason to use GCC was "free as in beer" but I digress. It took a long time for other compliers to actually produce good code for it. Then you could finally see proper ia64 performance.
But by this time, the market has given up, except for HP. The Itanic has sailed and sunk. Except for the few people who bought in and learned in and ended up *loving* it, it died. And there was a space it did win in. You wanted more than 4 sockets and intel? Itanium -- and that was the space that killed Sun and arguably killed IBM.
AMD, as mentioned by nickggully, just bolted 32 bits more onto the 32 bits of the x86 design, which worked, mostly, and thus you could now run 64 bit code, mostly, and get fast x86 performance.
Now, of course, we're still carrying *all that baggage* and we will basically forevermore, because if you don't run x86, you don't count. The only thing that's been a challenge has been ARM in the mobile space.
posted by eriko at 3:17 PM on August 14, 2015 [3 favorites]
If you complied and ran ia64 code, it was a pretty solid CPU in the first iterations and a top flight one in the second iteration. But x86 *won*, and x86 performance was horrible, and nobody cares about ia86 performance at first so the word got out that the Itanium's performance was horrible.
No, it was horrible at x86 code. Not surprising. It was good with properly compiled ia64 code. But nobody was really testing that. The popular press certainly wasn't, their benchmarks weren't written in ia64!
And to properly compile for the thing, you needed a proper complier, and for a long time, only Intel had one. And you paid for it. Then GCC claimed to target it, but GCC produced *HORRIBLE* code for it, thus making the performance legend even worse. Then again, if you wanted performance, GCC was *never* a good complier. Well, really, the only reason to use GCC was "free as in beer" but I digress. It took a long time for other compliers to actually produce good code for it. Then you could finally see proper ia64 performance.
But by this time, the market has given up, except for HP. The Itanic has sailed and sunk. Except for the few people who bought in and learned in and ended up *loving* it, it died. And there was a space it did win in. You wanted more than 4 sockets and intel? Itanium -- and that was the space that killed Sun and arguably killed IBM.
AMD, as mentioned by nickggully, just bolted 32 bits more onto the 32 bits of the x86 design, which worked, mostly, and thus you could now run 64 bit code, mostly, and get fast x86 performance.
Now, of course, we're still carrying *all that baggage* and we will basically forevermore, because if you don't run x86, you don't count. The only thing that's been a challenge has been ARM in the mobile space.
posted by eriko at 3:17 PM on August 14, 2015 [3 favorites]
nickgully, it was more pernicious than that. (I spent a few years working for Intel around the time of the Itanic)
Intel was explicitly looking to put the commodity genie back in the bottle, and kill off the x86 ISA which had been 'accidentally' licensed to those 'other' design shops.
The compiler-centric design of the Itanium was not just a bold architectural choice, it was also encouraged as a way to return software subscription revenues (or at least licence contract control) back to the silicon supplier. Yeah GCC worked on IA64, but it was ~20% slower than ICC (or the MS chain, which included much wintel shared IP)... so to get and maintain the best performance on your hardware, you were going to have to pay twice. (Sorry Linux users, have you considered an "enterprise" OS?)
It only surprised me that it got as far as it did.
posted by zeypher at 3:18 PM on August 14, 2015 [10 favorites]
Intel was explicitly looking to put the commodity genie back in the bottle, and kill off the x86 ISA which had been 'accidentally' licensed to those 'other' design shops.
The compiler-centric design of the Itanium was not just a bold architectural choice, it was also encouraged as a way to return software subscription revenues (or at least licence contract control) back to the silicon supplier. Yeah GCC worked on IA64, but it was ~20% slower than ICC (or the MS chain, which included much wintel shared IP)... so to get and maintain the best performance on your hardware, you were going to have to pay twice. (Sorry Linux users, have you considered an "enterprise" OS?)
It only surprised me that it got as far as it did.
posted by zeypher at 3:18 PM on August 14, 2015 [10 favorites]
The thing about Intel is, they haven't had success with a new ISA since the 8086, which was designed starting in 1976. Before we know it, we'll be celebrating the 40th anniversary of x86!
posted by jepler at 3:24 PM on August 14, 2015
posted by jepler at 3:24 PM on August 14, 2015
Next FPP: A guide to the terminology in this series which the author doesn't explain because he assumes some basic level of computer science understanding which English majors don't have even though they build their own computers and read Slashdot.
posted by radicalawyer at 3:25 PM on August 14, 2015 [1 favorite]
posted by radicalawyer at 3:25 PM on August 14, 2015 [1 favorite]
> I just like saying Itanium.
> Itanium Itanium Itanium.
> posted by selfnoise
Eponysteresis can break out anywhere.
posted by benito.strauss at 3:29 PM on August 14, 2015 [3 favorites]
> Itanium Itanium Itanium.
> posted by selfnoise
Eponysteresis can break out anywhere.
posted by benito.strauss at 3:29 PM on August 14, 2015 [3 favorites]
And to properly compile for the thing, you needed a proper complier, and for a long time, only Intel had one.
This seems like an obvious failure to commoditize your complement. Intel was in the business of selling chips, not compilers, but they were greedy. Had they open sourced their high end compiler for Itanium as a GCC backend, we'd probably all be on IA64 machines now.
posted by fatbird at 3:44 PM on August 14, 2015 [5 favorites]
This seems like an obvious failure to commoditize your complement. Intel was in the business of selling chips, not compilers, but they were greedy. Had they open sourced their high end compiler for Itanium as a GCC backend, we'd probably all be on IA64 machines now.
posted by fatbird at 3:44 PM on August 14, 2015 [5 favorites]
nickggully: I don't think it's accurate to say that Itanium failed simply due to marketing or even legacy x86 code. I used to work for a group of computational scientists whose work depended on floating-point performance. They had the source for their apps and actively invested in tuning for various systems and compilers so it was about as favorable a market as you could find: if it was fast, we bought some. The problem was simply that the chips always shipped late and the performance per dollar at best approached the x86 world. Even using Intel's latest $$$ compiler, investing time to profile & tune, etc. the FLOP-per-dollar ratio just never made sense – and then AMD shipped the Opteron, which made the comparison much less favorable. It's possible that a sufficiently smart – okay, positively brilliant – compiler could have helped but it looked to me like they just never found a way to ship in enough volume to get on the right side of the semiconductor production curve.
The same thing happened later when Sony was trying to sell the PS-3's Cell for HPC work – it was plausible when they first announced it but the extremely limited volume meant that the pricing never worked out to where most people could justify the time needed to optimize for the architecture. It's just such a hard sell to say that with optimization heroics you might have a win when the alternative is to spend no more than the same amount of money with near-certainty of getting competitive performance.
effugas raised the question of how it would look now and I think it's considerably improved not just because you can assume that everyone has a high-quality compiler, or even the ability to ship an intermediate format which will be compiled for the exact chip which it will run on but also due to everyone spending the last couple of decades switching to higher-level libraries and abstractions for expressing things like data-level parallelism which make it easier to make aggressive optimizations in code which needs to run on many different chips. Something very much like VLIW has already gone mainstream in GPUs and, interestingly, it both replaced earlier designs and was itself subsequently replaced with newer RISC designs without much fanfare.
posted by adamsc at 3:59 PM on August 14, 2015 [8 favorites]
The same thing happened later when Sony was trying to sell the PS-3's Cell for HPC work – it was plausible when they first announced it but the extremely limited volume meant that the pricing never worked out to where most people could justify the time needed to optimize for the architecture. It's just such a hard sell to say that with optimization heroics you might have a win when the alternative is to spend no more than the same amount of money with near-certainty of getting competitive performance.
effugas raised the question of how it would look now and I think it's considerably improved not just because you can assume that everyone has a high-quality compiler, or even the ability to ship an intermediate format which will be compiled for the exact chip which it will run on but also due to everyone spending the last couple of decades switching to higher-level libraries and abstractions for expressing things like data-level parallelism which make it easier to make aggressive optimizations in code which needs to run on many different chips. Something very much like VLIW has already gone mainstream in GPUs and, interestingly, it both replaced earlier designs and was itself subsequently replaced with newer RISC designs without much fanfare.
posted by adamsc at 3:59 PM on August 14, 2015 [8 favorites]
I wonder if that's a reference to Itanium being called "Itanic" by The Register, even before the first chips were released.
Yes, it almost certainly was.
posted by grouse at 4:01 PM on August 14, 2015
Yes, it almost certainly was.
posted by grouse at 4:01 PM on August 14, 2015
One of the Itanium (and before that, at HP the PA-RISC) architects, Bill Worley, is one of the founders of Secure64. They use Itanium's security features to build robust network infrastructure, such as DNS servers. Pretty neat stuff, though I can't help but thinking at least some of the security attained is through obscurity.
posted by dylanjames at 4:06 PM on August 14, 2015 [3 favorites]
posted by dylanjames at 4:06 PM on August 14, 2015 [3 favorites]
Thank you for this post. I think I took The Old New Thing out of my RSS feeds a while back (yes I still have them, no I don't really read them) but I adore Raymond Chen. M$ haters should be careful about reading his blog because you might wind up with Stockholm Syndrome reading about the poor bastard who had to make sure Quicken worked with the next edition of Windows in spite of how awful Intuit was (rinse, repeat x 1000).
posted by yerfatma at 4:47 PM on August 14, 2015 [2 favorites]
posted by yerfatma at 4:47 PM on August 14, 2015 [2 favorites]
The mighty Itanic. God himself couldn't improve its x86 performance.
posted by Jessica Savitch's Coke Spoon at 4:54 PM on August 14, 2015
posted by Jessica Savitch's Coke Spoon at 4:54 PM on August 14, 2015
Next FPP: A guide to the terminology in this series which the author doesn't explain because he assumes some basic level of computer science understanding which English majors don't have even though they build their own computers and read Slashdot.
In fact, people who build their own computers may well not know a lot of this, this is at least one level deeper geekery than that. I understand what I've read so far (having read three parts at the moment) and most of my assembly experience was back on the ol' C64. Let me see if I can find some things that I can explain. All of this, of course, is open to correction by people more current with modern processor design.
One of the first possibly-confusable terms is register. You know what a byte is, right? It's eight bits, each a binary 0 or 1. Strung together, bytes can represent integers from 0 to 255. That's just a matter of interpretation though; you can use those digits to represent all kinds of things, numbers, ASCII characters, or even eight individual on-or-off "flags". And if you take multiple bytes together, you can represent even moree things, wider ranges of things, just like if you add four more digits to a four-digit decimal number, you expand the range of numbers you can represent, literally exponentially. One of the most important things these bytes can represent are instructions, specific numbers that are defined as representing to the process a certain thing it is to do.
A program is just a sequence of these instructions or opcodes, some accompanied by parameters, or oprands, which represent data that the instruction is supposed to operate on. So the processor runs through its memory, one instruction at a time, executes it, then moves on to the next. Inside the processor is a special bit of memory called the program counter, that keeps track of where in the program the processor is currently reading. This model of operation emulates the famous Turing machine, a theoretical device thought up by mathematician Alan Turing, that involved a device that read along an infinitely-long tape of instructions, but some instructions involve advancing or retracting the tape itself. (This is not entirely true of current machines, especially the Itanium, but I'll get around to that.)
The computer's main RAM memory is represented to the processor as a big sequence of bytes, in numeric order. The program counter (PC) is basically just the number of the byte that contains the current instruction (or thereabouts). The program counter itself, however, is not stored in that RAM, but in a special little piece of memory inside the processor. This memory is called a register, and PC isn't the only one. There's tons of these in most processors these days. Some of them like PC have special functions, but others are allowed use by programs to do what they want with. Because not only are registers convenient, they are also much faster to access than main system RAM, which for technical reasons delays the machine slightly every time it's accessed.
Back to the idea of the Turing machine. What I've just explained is not strictly true for modern machines. One of the tricks modern processors use to try to increase speed beyond hard physical limits is parallel processing. You see, ordinarily this is what an unoptimized processor does as it executes a program:
1. Fetch the instruction at PC.
2. Decode it (figure out which operation we're supposed to do).
3. Does it need a parameter? If not, skip to 5.
4. Get the parameter from RAM following PC.
5. Now we know the operation, perform it. This may require further memory accesses, either to read or write. We may even need to change PC itself, if it's a jump instruction (go somewhere else in memory and start executing!) or branch (conditionally jump, if certain things are true).
6. Increase PC by the size of the instruction. (Unless it was a jump or branch.)
7. Jump back to 1.
If you just do it this way, well, computers are fast, but there's a limit to how fast you can get, and we're talking about hard limits dictated by the physics of our universe itself. So, processor designers started thinking about different ways they could optimize this process.
One way was prefetch. Getting data from memory is a costly process, in terms of time. So, while the part of the processor that's doing the operatin' is going, why not have the part of the processor that gets stuff out of memory go ahead and get the next instruction to decode? Do two things at once, eh wot? And this is a fine idea for the most part.
But then... what if the instruction being worked on is a jump? Then we'll end up having to fetch a different instruction than the one we thought we were going to have to work on! Or worse yet, what if it was a branch? Well it's not optimal, but if we have to do it we could just discard what we prefetched and do a normal fetch; it's slower, but at least our program will still behave consistently.
Also, how are we to know exactly which bytes to prefetch when they depend on the side of the current instruction? Ah, but we can handle that! We'll just make sure all instructions are the same size, that is, the largest possible instruction-and-operand we might have to perform, and pad out unneeded parts of the instruction with 0 bits. (That could result in wasted memory if we're just doing a bunch of simple instructions that don't need operands, but hey, memory is cheap these days, right?)
In fact, there are a lot of tricks you can do to try to do more things at once. Processors these days can contain multiple cores, each of which capable of fetching and executing instructions at once, with their own program counter and registers. But there's potential problems with that too. What if both cores end up working on the same thing at once? Each is probably using a region of memory as work space. What happens when both cores try operating on the same bit of memory at the same time? It is a big problem when this happens, because the two threads can step on each other's toes, so to speak, and create very difficult-to-find bugs. It might just not work, or it might work perfectly fine, or it might appear to work but in fact produce an incorrect result, or it might crash, and maddeningly whichever it does might appear to be entirely random, or depend on obscure, almost arbitrary factors.
The x86 processor standard goes about this by trying to predict what programs will do before time, doing as much as it can ahead of schedule, and like with the prefetch strategy I described before, just throwing away the results of pre-done work if it turns out that work is incorrect. Basically, it tries to do as much as the processor can to ensure as much of the chip is doing things that might possibly be useful ahead of time.
Itanium does some of this stuff, but it also provides mechanisms to better allow the compiler to produce code that is easy to predict, and thus improve efficiency that way. Its system of instruction groups are contiguous sets of instructions that don't effect each other, none of which (for the most part) relying on the results of the other instructions in the same group. The contents of the group are determined by the compiler, the program that turns high-level program code (like C) into the machine language the processor directly executes, and thus ultimately the author of the compiler. That's what is meant by "offloading complexity onto the compiler," it demands that the author of the compiler do more work to make it easier for the processor to perform well. Because the instructions in each group don't rely on each other (which it's the compiler's job to figure out how to do) the processor can do as many of them at the same time as it has facilities to execute. If it's 12 instructions, it can potentially do them ALL at once, and since none of them will have to worry about stepping on the toes of the others it can be sure that the execution is consistent and won't result in any hard-to-find bugs. When the code hits a stop, that's the processor's signal that all the instructions that were executing before have to finish up before it moves on.
Does this make sense? I'm hoping this will provide at least a start at understanding these highly interesting and informative articles. Maybe someone else can pick up the torch from here, or I might pick up later, there's still stacks and register windows to explain and other esoteric stuff like that. But I really think, in the end, that this stuff isn't really hard to understand, it just needs to be explicated in a decently simple way.
posted by JHarris at 4:58 PM on August 14, 2015 [36 favorites]
In fact, people who build their own computers may well not know a lot of this, this is at least one level deeper geekery than that. I understand what I've read so far (having read three parts at the moment) and most of my assembly experience was back on the ol' C64. Let me see if I can find some things that I can explain. All of this, of course, is open to correction by people more current with modern processor design.
One of the first possibly-confusable terms is register. You know what a byte is, right? It's eight bits, each a binary 0 or 1. Strung together, bytes can represent integers from 0 to 255. That's just a matter of interpretation though; you can use those digits to represent all kinds of things, numbers, ASCII characters, or even eight individual on-or-off "flags". And if you take multiple bytes together, you can represent even moree things, wider ranges of things, just like if you add four more digits to a four-digit decimal number, you expand the range of numbers you can represent, literally exponentially. One of the most important things these bytes can represent are instructions, specific numbers that are defined as representing to the process a certain thing it is to do.
A program is just a sequence of these instructions or opcodes, some accompanied by parameters, or oprands, which represent data that the instruction is supposed to operate on. So the processor runs through its memory, one instruction at a time, executes it, then moves on to the next. Inside the processor is a special bit of memory called the program counter, that keeps track of where in the program the processor is currently reading. This model of operation emulates the famous Turing machine, a theoretical device thought up by mathematician Alan Turing, that involved a device that read along an infinitely-long tape of instructions, but some instructions involve advancing or retracting the tape itself. (This is not entirely true of current machines, especially the Itanium, but I'll get around to that.)
The computer's main RAM memory is represented to the processor as a big sequence of bytes, in numeric order. The program counter (PC) is basically just the number of the byte that contains the current instruction (or thereabouts). The program counter itself, however, is not stored in that RAM, but in a special little piece of memory inside the processor. This memory is called a register, and PC isn't the only one. There's tons of these in most processors these days. Some of them like PC have special functions, but others are allowed use by programs to do what they want with. Because not only are registers convenient, they are also much faster to access than main system RAM, which for technical reasons delays the machine slightly every time it's accessed.
Back to the idea of the Turing machine. What I've just explained is not strictly true for modern machines. One of the tricks modern processors use to try to increase speed beyond hard physical limits is parallel processing. You see, ordinarily this is what an unoptimized processor does as it executes a program:
1. Fetch the instruction at PC.
2. Decode it (figure out which operation we're supposed to do).
3. Does it need a parameter? If not, skip to 5.
4. Get the parameter from RAM following PC.
5. Now we know the operation, perform it. This may require further memory accesses, either to read or write. We may even need to change PC itself, if it's a jump instruction (go somewhere else in memory and start executing!) or branch (conditionally jump, if certain things are true).
6. Increase PC by the size of the instruction. (Unless it was a jump or branch.)
7. Jump back to 1.
If you just do it this way, well, computers are fast, but there's a limit to how fast you can get, and we're talking about hard limits dictated by the physics of our universe itself. So, processor designers started thinking about different ways they could optimize this process.
One way was prefetch. Getting data from memory is a costly process, in terms of time. So, while the part of the processor that's doing the operatin' is going, why not have the part of the processor that gets stuff out of memory go ahead and get the next instruction to decode? Do two things at once, eh wot? And this is a fine idea for the most part.
But then... what if the instruction being worked on is a jump? Then we'll end up having to fetch a different instruction than the one we thought we were going to have to work on! Or worse yet, what if it was a branch? Well it's not optimal, but if we have to do it we could just discard what we prefetched and do a normal fetch; it's slower, but at least our program will still behave consistently.
Also, how are we to know exactly which bytes to prefetch when they depend on the side of the current instruction? Ah, but we can handle that! We'll just make sure all instructions are the same size, that is, the largest possible instruction-and-operand we might have to perform, and pad out unneeded parts of the instruction with 0 bits. (That could result in wasted memory if we're just doing a bunch of simple instructions that don't need operands, but hey, memory is cheap these days, right?)
In fact, there are a lot of tricks you can do to try to do more things at once. Processors these days can contain multiple cores, each of which capable of fetching and executing instructions at once, with their own program counter and registers. But there's potential problems with that too. What if both cores end up working on the same thing at once? Each is probably using a region of memory as work space. What happens when both cores try operating on the same bit of memory at the same time? It is a big problem when this happens, because the two threads can step on each other's toes, so to speak, and create very difficult-to-find bugs. It might just not work, or it might work perfectly fine, or it might appear to work but in fact produce an incorrect result, or it might crash, and maddeningly whichever it does might appear to be entirely random, or depend on obscure, almost arbitrary factors.
The x86 processor standard goes about this by trying to predict what programs will do before time, doing as much as it can ahead of schedule, and like with the prefetch strategy I described before, just throwing away the results of pre-done work if it turns out that work is incorrect. Basically, it tries to do as much as the processor can to ensure as much of the chip is doing things that might possibly be useful ahead of time.
Itanium does some of this stuff, but it also provides mechanisms to better allow the compiler to produce code that is easy to predict, and thus improve efficiency that way. Its system of instruction groups are contiguous sets of instructions that don't effect each other, none of which (for the most part) relying on the results of the other instructions in the same group. The contents of the group are determined by the compiler, the program that turns high-level program code (like C) into the machine language the processor directly executes, and thus ultimately the author of the compiler. That's what is meant by "offloading complexity onto the compiler," it demands that the author of the compiler do more work to make it easier for the processor to perform well. Because the instructions in each group don't rely on each other (which it's the compiler's job to figure out how to do) the processor can do as many of them at the same time as it has facilities to execute. If it's 12 instructions, it can potentially do them ALL at once, and since none of them will have to worry about stepping on the toes of the others it can be sure that the execution is consistent and won't result in any hard-to-find bugs. When the code hits a stop, that's the processor's signal that all the instructions that were executing before have to finish up before it moves on.
Does this make sense? I'm hoping this will provide at least a start at understanding these highly interesting and informative articles. Maybe someone else can pick up the torch from here, or I might pick up later, there's still stacks and register windows to explain and other esoteric stuff like that. But I really think, in the end, that this stuff isn't really hard to understand, it just needs to be explicated in a decently simple way.
posted by JHarris at 4:58 PM on August 14, 2015 [36 favorites]
if you don't run x86, you don't count. The only thing that's been a challenge has been ARM in the mobile space.
ARM's going to be a threat on more than mobile, and sooner than you think. I saw a post on... Daring Fireball, maybe? I'll try to find it later - that ran some benchmarks showing that just by naively adding extra ARM cores you can get like 90% of the performance of contemporary x86 chips for about 50% of the price (and BETTER performance on some graphics benchmarks); Apple might switch just for the sake of making decent margins on macbooks again. I know it doesn't address the entrenched x86 install base, but with more people working on iOS and Android every day, the time is getting more and more ripe to switch to a different low-to-mid-end commodity architecture.
posted by Joey Buttafoucault at 5:04 PM on August 14, 2015 [3 favorites]
ARM's going to be a threat on more than mobile, and sooner than you think. I saw a post on... Daring Fireball, maybe? I'll try to find it later - that ran some benchmarks showing that just by naively adding extra ARM cores you can get like 90% of the performance of contemporary x86 chips for about 50% of the price (and BETTER performance on some graphics benchmarks); Apple might switch just for the sake of making decent margins on macbooks again. I know it doesn't address the entrenched x86 install base, but with more people working on iOS and Android every day, the time is getting more and more ripe to switch to a different low-to-mid-end commodity architecture.
posted by Joey Buttafoucault at 5:04 PM on August 14, 2015 [3 favorites]
That's excellent, jharris.
It's possible that a sufficiently smart – okay, positively brilliant – compiler could have helped
I remember at the time that there was a lot of discussion about how Itanium had an uphill battle to fight because compiler optimization, over time, is a huge factor in chip performance as compiler authors figure out real world characteristics of chips and various heuristics to milk as much performance as possible from it. This was cited as a problem for Itanium because it takes time and real world experience to figure this stuff out, so even such an obviously superior design as IA would initially appear to be a poor performer in comparison. That said, jharris' post reminds me that one marketed aspect of IA was that it would be easier for compilers to optimize for its eminently sane architecture, while x86 compilers had to waste a lot of effort just dealing with the *ahem* idiosyncratic cumulative x86 instruction set after many decades of use.
So, again, it seems like a self-inflicted wound for Intel not to put a ton of resources early on into making freely available a really good compiler for IA.
posted by fatbird at 5:13 PM on August 14, 2015 [2 favorites]
It's possible that a sufficiently smart – okay, positively brilliant – compiler could have helped
I remember at the time that there was a lot of discussion about how Itanium had an uphill battle to fight because compiler optimization, over time, is a huge factor in chip performance as compiler authors figure out real world characteristics of chips and various heuristics to milk as much performance as possible from it. This was cited as a problem for Itanium because it takes time and real world experience to figure this stuff out, so even such an obviously superior design as IA would initially appear to be a poor performer in comparison. That said, jharris' post reminds me that one marketed aspect of IA was that it would be easier for compilers to optimize for its eminently sane architecture, while x86 compilers had to waste a lot of effort just dealing with the *ahem* idiosyncratic cumulative x86 instruction set after many decades of use.
So, again, it seems like a self-inflicted wound for Intel not to put a ton of resources early on into making freely available a really good compiler for IA.
posted by fatbird at 5:13 PM on August 14, 2015 [2 favorites]
Next FPP: A guide to the terminology in this series which the author doesn't explain because he assumes some basic level of computer science understanding which English majors don't have even though they build their own computers and read Slashdot.JHarris provided an excellent primer that should help the layman appreciate some of the articles, but Raymond Chen is a programmers' programmer, and he pretty explicitly writes for that audience, and often on pretty advanced and esoteric concepts, even for that audience. I have a CS degree and currently work in software engineering and I found this series to be a relatively challenging read myself, as I rarely have to get this close to bare metal in my work.
posted by Aleyn at 5:20 PM on August 14, 2015 [2 favorites]
M$ haters should be careful about reading his blog because you might wind up with Stockholm Syndrome reading about the poor bastard who had to make sure Quicken worked with the next edition of Windows in spite of how awful Intuit was
Yes indeed: reading his blog is often fascinating in revealing (a) how hard it is for Microsoft to make improvements or even bug-fixes to Windows APIs because doing so inevitably breaks code which is dependent on that exact behavior, and (b) how much invisible compatibility shimming Windows does to keep old (and often badly broken) applications running.
I found this series to be a relatively challenging read myself, as I rarely have to get this close to bare metal in my work.
I do work that close to bare-metal, but on simpler CPU architectures; the last 2 or 3 entries in this series were quite mind-bending. Chen does have a knack for illustrating concepts with good diagrams.
posted by We had a deal, Kyle at 5:38 PM on August 14, 2015
Yes indeed: reading his blog is often fascinating in revealing (a) how hard it is for Microsoft to make improvements or even bug-fixes to Windows APIs because doing so inevitably breaks code which is dependent on that exact behavior, and (b) how much invisible compatibility shimming Windows does to keep old (and often badly broken) applications running.
I found this series to be a relatively challenging read myself, as I rarely have to get this close to bare metal in my work.
I do work that close to bare-metal, but on simpler CPU architectures; the last 2 or 3 entries in this series were quite mind-bending. Chen does have a knack for illustrating concepts with good diagrams.
posted by We had a deal, Kyle at 5:38 PM on August 14, 2015
(BTW, I feel I should add here that I, in fact, have a Master's degree in English Literature.)
posted by JHarris at 6:04 PM on August 14, 2015 [13 favorites]
posted by JHarris at 6:04 PM on August 14, 2015 [13 favorites]
ARM is a threat to x86, but is ARM a threat to Intel? Not as long as Intel has money in the bank.
Intel could license ARM or buy an existing ARM licensee (they owned and then sold XScale), and with their engineers and their manufacturing and their willingness to make parts with 100+W TDPs, could make the fastest ARM processors bar none.
I'm salivating a little bit, imagining a Xeon-class ARM v8-a with about 64 threads and enough main memory bandwidth to keep them all fed. mmm.
And as a bonus it'll give the engineers something to do now that their tick-tock cadence has faltered with x86.
posted by jepler at 6:17 PM on August 14, 2015 [1 favorite]
Intel could license ARM or buy an existing ARM licensee (they owned and then sold XScale), and with their engineers and their manufacturing and their willingness to make parts with 100+W TDPs, could make the fastest ARM processors bar none.
I'm salivating a little bit, imagining a Xeon-class ARM v8-a with about 64 threads and enough main memory bandwidth to keep them all fed. mmm.
And as a bonus it'll give the engineers something to do now that their tick-tock cadence has faltered with x86.
posted by jepler at 6:17 PM on August 14, 2015 [1 favorite]
fatbird:
I completely agree that charging for decent compilers is incredibly short-sighted. It was common at the time – IBM's xlc was the best option for POWER/PowerPC, HP had their PA-RISC stuff, etc. – and it held back all of those platforms. Beyond the obvious increased barrier to entry, there was also a critical problem with quality: the vendors tended to heavily optimize for the SPEC benchmarks used by their marketing teams but quality varied for anything which those benchmarks didn't hit[1] and support for new features which weren't part of the benchmarks was also inconsistent unless they had a very big customer asking for it.
That encouraged developers to use gcc since it meant that a) you could use the same compiler everywhere and b) the lack of a barrier to entry meant that it was less likely that you'd find a new bug and, if you did, that you'd be able to fix it or get a new version without having to interest a vendor, wait for long periods of time and pay for an upgrade. I remember more than one conversation in that era which was roughly “gcc is slower but it returns correct results. If [vendor] wants this to be faster they can fix their compiler”. If you sold enough software on their platform they might loan you a system but a surprising number of now-defunct vendors assumed you'd pay for the purpose of making their platform competitive.
I'd be surprised if the current level support for LLVM wasn't at least in part driven by the surviving processor vendors not wanting to get in that cycle again. I only saw this from the outside and can only imagine how frustrating it must have been on the inside.
1. It wasn't unheard of to find cases where even using the same kind of operation but with different input values or ordering produced much lower performance. There were lots of rumors but I doubt that was cheating instead of simply having the team obsessively chasing benchmark scores and moving on as soon as they hit their target.
posted by adamsc at 6:23 PM on August 14, 2015 [6 favorites]
I completely agree that charging for decent compilers is incredibly short-sighted. It was common at the time – IBM's xlc was the best option for POWER/PowerPC, HP had their PA-RISC stuff, etc. – and it held back all of those platforms. Beyond the obvious increased barrier to entry, there was also a critical problem with quality: the vendors tended to heavily optimize for the SPEC benchmarks used by their marketing teams but quality varied for anything which those benchmarks didn't hit[1] and support for new features which weren't part of the benchmarks was also inconsistent unless they had a very big customer asking for it.
That encouraged developers to use gcc since it meant that a) you could use the same compiler everywhere and b) the lack of a barrier to entry meant that it was less likely that you'd find a new bug and, if you did, that you'd be able to fix it or get a new version without having to interest a vendor, wait for long periods of time and pay for an upgrade. I remember more than one conversation in that era which was roughly “gcc is slower but it returns correct results. If [vendor] wants this to be faster they can fix their compiler”. If you sold enough software on their platform they might loan you a system but a surprising number of now-defunct vendors assumed you'd pay for the purpose of making their platform competitive.
I'd be surprised if the current level support for LLVM wasn't at least in part driven by the surviving processor vendors not wanting to get in that cycle again. I only saw this from the outside and can only imagine how frustrating it must have been on the inside.
1. It wasn't unheard of to find cases where even using the same kind of operation but with different input values or ordering produced much lower performance. There were lots of rumors but I doubt that was cheating instead of simply having the team obsessively chasing benchmark scores and moving on as soon as they hit their target.
posted by adamsc at 6:23 PM on August 14, 2015 [6 favorites]
If you want to hear about cheating, binaries produced by the Intel C Compiler check the CPU's vendor ID and run slower code if it's not GenuineIntel.
posted by save alive nothing that breatheth at 7:40 PM on August 14, 2015 [1 favorite]
posted by save alive nothing that breatheth at 7:40 PM on August 14, 2015 [1 favorite]
Computer Design is black fucking magic. I've been in software engineering for over a decade and the only thing that's genuinely intimidated me has been CD. If you'd like not learn more, check out "computer organization" as a topic. This relies on a basic-ass made up CPU called MIPS to hand-hold college ding-dongs through the process of understanding how little a body can possibly ever know about computer. Imagine if the fatality rate for learning how to ride a bicycle was around 80%. That's what computer design is like. It will break you.
posted by boo_radley at 8:24 PM on August 14, 2015 [7 favorites]
posted by boo_radley at 8:24 PM on August 14, 2015 [7 favorites]
What I was thinking about is something like detecting benchmark code and substituting a hand-tuned version which is better than the compiler can produce (if memory serves, that has been confirmed in video drivers).
The Intel compiler using vendor rather than feature detection is dirty and deserves anticompetitive penalties but it's less dishonest in the sense that you could also write code which will actually run that fast.
posted by adamsc at 8:30 PM on August 14, 2015 [1 favorite]
The Intel compiler using vendor rather than feature detection is dirty and deserves anticompetitive penalties but it's less dishonest in the sense that you could also write code which will actually run that fast.
posted by adamsc at 8:30 PM on August 14, 2015 [1 favorite]
MIPS is a real CPU architecture, still kicking today. They've even got a 64-bit version according to wikipedia. But yeah, the gulf between what you can learn in a semester class on Computer Design and what goes into any modern CPU is huge. An abyss.
The confusingly similar in name MIX is a hypothetical CPU architecture used by Knuth.
posted by jepler at 8:37 PM on August 14, 2015 [4 favorites]
The confusingly similar in name MIX is a hypothetical CPU architecture used by Knuth.
posted by jepler at 8:37 PM on August 14, 2015 [4 favorites]
Yes, MIPS is real and still very much around (part of Imagination now, but still licensing out MIPS cores for SoC designs); the chip I work on has MIPS24k cores. Branch delay slots, yo.
posted by We had a deal, Kyle at 8:42 PM on August 14, 2015
posted by We had a deal, Kyle at 8:42 PM on August 14, 2015
boo_radley, with due respect, I don't think it's utterly incomprehensible, although yeah it's not simple either. But the stuff in the Computer Organization link you presented (which I think is a better, more grounded presentation than my comment above) I mostly learned before the internet even came around, although a Computer Architecture class I had did a lot to solidify that knowledge.
(My end-of-semester project was to design and wire up a four-bit multiplier out of raw logic gates. I had such fun with it. I basically told my lab partner he could just coast and let me do it all, and I did. It was one of the best classes I've ever had, and sometimes I find myself wishing I could find a use for some of that knowledge now.)
posted by JHarris at 10:46 PM on August 14, 2015 [1 favorite]
(My end-of-semester project was to design and wire up a four-bit multiplier out of raw logic gates. I had such fun with it. I basically told my lab partner he could just coast and let me do it all, and I did. It was one of the best classes I've ever had, and sometimes I find myself wishing I could find a use for some of that knowledge now.)
posted by JHarris at 10:46 PM on August 14, 2015 [1 favorite]
(Not that I'm trying to sound too cocky, mind you. Let's face it, it'd been a decade since I had that class....)
posted by JHarris at 10:48 PM on August 14, 2015
posted by JHarris at 10:48 PM on August 14, 2015
The biggest lie of your computer is that it's just one computer.
The biggest lie of your CPU is that it's just one processor. There's basically piles and piles of independent units -- some doing integer math, some doing floating point math, some storing data for later, some translating instructions from what you think you're executing into what you actually are, some predicting what you might execute next and running it just in case, etc etc etc -- all of which are doing things simultaneously, and on a nanosecond scale are providing proper inputs at proper times to the next element in the network.
This dance of components is synchronized by a clock, which is why clock speed is such a poor metric for performance. Yeah, that's the rhythm for the dancers. It says nothing about the quality of the dance.
(Also, at the nanosecond scale, space is time, so synchronizing everything is a Thing.)
posted by effugas at 11:45 PM on August 14, 2015 [5 favorites]
The biggest lie of your CPU is that it's just one processor. There's basically piles and piles of independent units -- some doing integer math, some doing floating point math, some storing data for later, some translating instructions from what you think you're executing into what you actually are, some predicting what you might execute next and running it just in case, etc etc etc -- all of which are doing things simultaneously, and on a nanosecond scale are providing proper inputs at proper times to the next element in the network.
This dance of components is synchronized by a clock, which is why clock speed is such a poor metric for performance. Yeah, that's the rhythm for the dancers. It says nothing about the quality of the dance.
(Also, at the nanosecond scale, space is time, so synchronizing everything is a Thing.)
posted by effugas at 11:45 PM on August 14, 2015 [5 favorites]
Man, since I've been doing some lock-free concurrent stuff, I like, don't even believe in time anymore.
posted by save alive nothing that breatheth at 12:14 AM on August 15, 2015 [11 favorites]
posted by save alive nothing that breatheth at 12:14 AM on August 15, 2015 [11 favorites]
Man, since I've been doing some lock-free concurrent stuff, I like, don't even believe in time anymore.
"Time is an illusion. Lunchtime doubly so."
-- Douglas Adams.
posted by eriko at 6:51 AM on August 15, 2015
"Time is an illusion. Lunchtime doubly so."
-- Douglas Adams.
posted by eriko at 6:51 AM on August 15, 2015
The biggest thing you realize as you learn more about processors is how silly it is to reduce them to one benchmark.
Before you read this stuff, gigahertz sounds like a great, simple number, but then when you see all the oddly specific instructions, weird pipelining branch predictions, and so on, it becomes apparent that you're attempting to appraise the behavior of a very complicated system by one number.
Then you consider whether software uses those features effectively, and the whole thing is terribly confusing.
On an unrelated note, is this thread aware of TIS-100, a sim of a fictional vintage computer with many cores, and very limited instructions and space to write those instructions? You have to think concurrently, as there's no space otherwise. Ooh yeah, and every io operation between cores blocks.
posted by mccarty.tim at 7:47 AM on August 15, 2015 [1 favorite]
Before you read this stuff, gigahertz sounds like a great, simple number, but then when you see all the oddly specific instructions, weird pipelining branch predictions, and so on, it becomes apparent that you're attempting to appraise the behavior of a very complicated system by one number.
Then you consider whether software uses those features effectively, and the whole thing is terribly confusing.
On an unrelated note, is this thread aware of TIS-100, a sim of a fictional vintage computer with many cores, and very limited instructions and space to write those instructions? You have to think concurrently, as there's no space otherwise. Ooh yeah, and every io operation between cores blocks.
posted by mccarty.tim at 7:47 AM on August 15, 2015 [1 favorite]
I wrote another explainer, on stacks, but instead of posting another wall of text I'm just going to host it from my Dropbox: On the stack, and stack frames.
posted by JHarris at 1:27 PM on August 15, 2015 [1 favorite]
posted by JHarris at 1:27 PM on August 15, 2015 [1 favorite]
It's worth pulling out into this thread, however, this link, which is from the same blog as in the FPP, which explains a bit of how Itanium handles "stacked" registers. At the very least it's entertaining for containing the phrase If the processor runs out of squirrel-space.
posted by JHarris at 1:31 PM on August 15, 2015
posted by JHarris at 1:31 PM on August 15, 2015
mccarty.tim, I swear to Thomas Pynchon, if I fail to mow the lawn this weekend because TIS-100 turns out to be fiendishly addictive, there will be consequences.
posted by radicalawyer at 2:25 PM on August 15, 2015
posted by radicalawyer at 2:25 PM on August 15, 2015
It's like saying, "I normally toss my garbage on the sidewalk in front of the pet store, and every morning, when they open up, somebody sweeps up the garbage and tosses it into the trash. But the pet store isn't open on Sundays, so on Sundays, the garbage just sits there. How can I get the pet store to open on Sundays, too?"
I love Raymond Chen.
posted by JHarris at 3:02 PM on August 15, 2015 [5 favorites]
I love Raymond Chen.
posted by JHarris at 3:02 PM on August 15, 2015 [5 favorites]
Also:
Bob goes to the beach very frequently.
Every time there is a shark attack, Bob is at the beach.
Conclusion: Bob causes shark attacks.
Blaming ngen for the kernel crash is like blaming Bob for the shark attacks.
Bonus chatter: Some of my colleagues came to different conclusions:
Conclusion: Bob should stop going to the beach.
Conclusion: Bob must be the shark.
posted by JHarris at 3:06 PM on August 15, 2015 [1 favorite]
Bob goes to the beach very frequently.
Every time there is a shark attack, Bob is at the beach.
Conclusion: Bob causes shark attacks.
Blaming ngen for the kernel crash is like blaming Bob for the shark attacks.
Bonus chatter: Some of my colleagues came to different conclusions:
Conclusion: Bob should stop going to the beach.
Conclusion: Bob must be the shark.
posted by JHarris at 3:06 PM on August 15, 2015 [1 favorite]
The biggest lie of your CPU is that it's just one processor. There's basically piles and piles of independent units [...] all of which are doing things simultaneously, and on a nanosecond scale are providing proper inputs at proper times to the next element in the network.
There's a very vivid aimed-at-non-techies description of this in a chapter of Tracy Kidder's The Soul of a New Machine, which walks through the process of the Eclipse executing one instruction and how it fans out into the many subsystems of the CPU. Yes, Nova/Eclipse was many generations ago, but a lot of the fundamentals remain unchanged.
(It's a odd experience re-reading TSoaNM knowing that its subject is jessamyn's father; like slipping a different filter in front of the lens.)
posted by We had a deal, Kyle at 4:00 PM on August 15, 2015 [2 favorites]
There's a very vivid aimed-at-non-techies description of this in a chapter of Tracy Kidder's The Soul of a New Machine, which walks through the process of the Eclipse executing one instruction and how it fans out into the many subsystems of the CPU. Yes, Nova/Eclipse was many generations ago, but a lot of the fundamentals remain unchanged.
(It's a odd experience re-reading TSoaNM knowing that its subject is jessamyn's father; like slipping a different filter in front of the lens.)
posted by We had a deal, Kyle at 4:00 PM on August 15, 2015 [2 favorites]
(Yep. Obit MeTa; and also a MeFi post on Jessamyn's piece about dealing with the estate and the house.)
posted by We had a deal, Kyle at 6:29 PM on August 15, 2015 [1 favorite]
posted by We had a deal, Kyle at 6:29 PM on August 15, 2015 [1 favorite]
Joey Buttafoucault wrote: ARM's going to be a threat on more than mobile, and sooner than you think.
My copy of The Innovator's Dilemma has an endorsement on the cover from the Intel chairman, Andrew Grove. For a company who's executive leadership understands the problem they face, they seem incapable of doing anything about it. Intel's tried for years to break into the mobile market with little to show.
You can already buy AArch64, and run Ubuntu on it. Maybe it'll never take off, but we may be getting some build systems in our datacenter for it this year. Which is more than I can say for Intel Smartphones.
jepler wrote: ARM is a threat to x86, but is ARM a threat to Intel? Not as long as Intel has money in the bank.
Well, Intel's balance sheet isn't exactly brimming with good news. When you look at the assets they can use to buy a license / licensee, they have $17 billion. And lo, back in June they bought Altera for $16 billion. So they're done spending for a while, and you won't be seeing your ARM powerhouse. You'll have to make do with actual Xeon laptops with embedded FPGAs, I guess. More likely software defined switches running Xeon plus FPGAs.
But assuming for the minute that Intel hadn't gone on that shopping spree, the problem Intel faces is declining PC sales. Customer purchases are shifting; laptops are the new desktops, and tablets / smartphones are the new laptop. And I guess smartwatches are the new smartphone (but even Apple didn't announce how well their watch sold). That shift in consumer preference is a problem for Intel, who, as you gleefully mentioned, seems incapable of delivering low Thermal Design Profile chips. Even if they had straight up bought ARM Holdings (market cap: $20 billion), the engineers will not go far enough down the power curve to genuinely compete in mobile, and the accountants who supervise them will not let them anyways.
posted by pwnguin at 12:51 AM on August 16, 2015 [1 favorite]
My copy of The Innovator's Dilemma has an endorsement on the cover from the Intel chairman, Andrew Grove. For a company who's executive leadership understands the problem they face, they seem incapable of doing anything about it. Intel's tried for years to break into the mobile market with little to show.
You can already buy AArch64, and run Ubuntu on it. Maybe it'll never take off, but we may be getting some build systems in our datacenter for it this year. Which is more than I can say for Intel Smartphones.
jepler wrote: ARM is a threat to x86, but is ARM a threat to Intel? Not as long as Intel has money in the bank.
Well, Intel's balance sheet isn't exactly brimming with good news. When you look at the assets they can use to buy a license / licensee, they have $17 billion. And lo, back in June they bought Altera for $16 billion. So they're done spending for a while, and you won't be seeing your ARM powerhouse. You'll have to make do with actual Xeon laptops with embedded FPGAs, I guess. More likely software defined switches running Xeon plus FPGAs.
But assuming for the minute that Intel hadn't gone on that shopping spree, the problem Intel faces is declining PC sales. Customer purchases are shifting; laptops are the new desktops, and tablets / smartphones are the new laptop. And I guess smartwatches are the new smartphone (but even Apple didn't announce how well their watch sold). That shift in consumer preference is a problem for Intel, who, as you gleefully mentioned, seems incapable of delivering low Thermal Design Profile chips. Even if they had straight up bought ARM Holdings (market cap: $20 billion), the engineers will not go far enough down the power curve to genuinely compete in mobile, and the accountants who supervise them will not let them anyways.
posted by pwnguin at 12:51 AM on August 16, 2015 [1 favorite]
Hm, Altera has a 64-bit Cortex-A53 in their Stratix 10. Of course, that's not a high-performance core (A57 and A72 are), but my point is that with the Altera purchase it looks like they're bringing an ARM license in. Of course, whether they decide to drop that ball and just put x86 cores in the next generation of FPGAs (or vice versa), who can say.
posted by jepler at 5:29 AM on August 16, 2015
posted by jepler at 5:29 AM on August 16, 2015
ARM would be quite happy to license their IP to Intel. Indeed, Intel must have held ARM licenses back when they were selling the StrongARM chip design they got from DEC.
posted by pharm at 12:49 PM on August 16, 2015 [1 favorite]
posted by pharm at 12:49 PM on August 16, 2015 [1 favorite]
The global data pointer stuff and the fallout from it was making my blood boil. This is similar to the architecture encouraged by the 68000 processors, specifically on the Mac platform, where you put a pointer to your globals in register A4. If you wanted to read a global, you would do something like:
which was all well and good as long as offset was less than 32K, because offsets were only 16 bits. So work was shoved onto the compiler writers so that this instruction was replaced with:
and the offset of that -1 was recorded in a table along with the . When your code was loaded, that globals were allocated and the table of where all those -1s were located and the code was patched (in reality, you would write not the -1 but the 32 bit offset and then add it to the start address of the globals and store it back into the code).
It's a lesson that should be burned into every CPU designer's handbook: if you're going to allow offsets, make sure they can span the entire goddamn address range because somebody will need them for a legitimate reason and you're just screwing the compiler writers. Again.
Of course, if you're starting with a language like C/C++ where you can have all kinds of cockamamie side effects at any given time, you're already in hell as a compiler writer.
posted by plinth at 9:43 AM on August 17, 2015
mov.l offset(A4), D0
which was all well and good as long as offset was less than 32K, because offsets were only 16 bits. So work was shoved onto the compiler writers so that this instruction was replaced with:
mov -1, D0
and the offset of that -1 was recorded in a table along with the . When your code was loaded, that globals were allocated and the table of where all those -1s were located and the code was patched (in reality, you would write not the -1 but the 32 bit offset and then add it to the start address of the globals and store it back into the code).
It's a lesson that should be burned into every CPU designer's handbook: if you're going to allow offsets, make sure they can span the entire goddamn address range because somebody will need them for a legitimate reason and you're just screwing the compiler writers. Again.
Of course, if you're starting with a language like C/C++ where you can have all kinds of cockamamie side effects at any given time, you're already in hell as a compiler writer.
posted by plinth at 9:43 AM on August 17, 2015
« Older It's not all joy and kisses | The Closing of the Canadian Mind Newer »
This thread has been archived and is closed to new comments
posted by clvrmnky at 1:59 PM on August 14, 2015 [4 favorites]