Things CPU architects need to think about
December 21, 2011 12:11 PM   Subscribe

Things CPU architects need to think about. Bob Colwell gave this lecture in 2004, for the Stanford University Computer Systems Colloquium (EE380). Colwell was the chief architect of the Pentium Pro, Pentium II, Pentium III, and Pentium 4 processors. [About 90 minutes, Windows Media format]

This lecture covers many topics, and although it contains a lot of CPU architecture terminology, I think it'll make sense to any kind of computer geek.

Topics covered include:
  • Exponential growth in CPU performance has come with exponential growth in transistor count, power dissipation, complexity, design cost, and bugs. The market may not want to keep paying lots of money for performance once it's good enough, and those exponential trends aren't sustainable anyway. So what do we do next?
  • Faster chips don't come from a single big idea any more, they come from lots of little ideas put together. But these little ideas interact in odd ways that are difficult to predict and even more difficult to fix when performance problems occur.
  • Products have to be ready to produce as soon as the factory is ready to make them. This leads to design compromises and even corporate structure decisions to make sure new factories never sit there doing nothing.
  • How the Pentium FDIV bug and hated "CPUID" feature happened, and the surprisingly weird (from Intel's point of view) public reaction to them.
  • How the IA-64 (Itanium) program went so wrong.
  • CPU architects need to think about the long term because nobody else will, but also have to design chips that will sell today.
A huge number of other videos are available from the EE380 Colloquia for Previous Quarters page.
posted by FishBike (29 comments total) 35 users marked this as a favorite
I thought it was neat to hear about some of the problems facing CPU architects in 2004, and how those have shaped some of what's happened in the industry since then.
posted by FishBike at 12:11 PM on December 21, 2011

link appears to be broken...
posted by casual observer at 12:19 PM on December 21, 2011

Does this link work any better for you?
posted by FishBike at 12:24 PM on December 21, 2011

Yes thanks.
posted by casual observer at 12:27 PM on December 21, 2011

I had a bit of trouble, but eventually got the stream to play properly using VLC on my Mac. It threw some weird errors but I clicked through them and it seemed to work okay after that.
posted by Kadin2048 at 12:44 PM on December 21, 2011

Reminds me of my years in the WAN ASSP / ASIC semiconductor space 2000 - 2007 ... especially the part about "plan to throw the first one away." The space was just starting to get pounded mercilessly by the FPGA guys when I left ... I wonder how companies like Intel are addressing that threat?
posted by ZenMasterThis at 12:48 PM on December 21, 2011

Long shot, I know, but does anyone know of a transcript for this talk? It sounds interesting but I can't watch the video.
posted by Mars Saxman at 1:13 PM on December 21, 2011

(I feel bad some folks are having trouble playing this. I've been searching for it in some more universal format over the past year or so and keep coming up with nothing. I haven't found a transcript either. I was able to play it with VLC and Windows Media Player through both Firefox and IE, so I'm not sure what the exact problem is.)
posted by FishBike at 1:26 PM on December 21, 2011

There are so many avenues for optimizations now though : Strict functional langauges like Haskell offer compilers an unparalleled amount of information about what the code actually does. The LLVM compiler infrastructure offers incredibly flexible toolbox for building optimizations, including folding all the profiling information gained by a just-in-time compiler back into static compilation. Software engineers are being expected to more about algorithm design and data structures today. etc.
posted by jeffburdges at 1:37 PM on December 21, 2011 [1 favorite]

This is extremely interesting! Excellent link!
posted by JHarris at 1:48 PM on December 21, 2011

I'm also at work and cannot watch (yet)..

From the description, it sounds like he was pretty pessimistic about the future of chip design. (Which is interesting as '04 was also about the low point of Intel's competitiveness). Since then they've managed to stay on track with Moore's law.

I wonder if it's due to them avoiding the dangers identified back then or by bringing in a fresh batch of designers. It's my understanding that the Pentium IV and the Core micro-architectures were internal competitors with the Pentium IV on top but running out of gas in 2004. Anyone know more of the back story?
posted by PissOnYourParade at 5:34 PM on December 21, 2011

From the description, it sounds like he was pretty pessimistic about the future of chip design. (Which is interesting as '04 was also about the low point of Intel's competitiveness). Since then they've managed to stay on track with Moore's law.

My impression is more that he was pessimistic about Intel in particular and that's why he left. From what he says in the video, it sounds like they didn't get, at the time, that they couldn't just keep chasing after exponentially faster clock speeds and exponentially increasing transistor counts in a CPU core. It was hard to convince them of that, because it had been working pretty well up until then.

But they must have understood eventually, because I look at what we have today: clock speeds lower or about the same as they were at the end of the Pentium 4, but much better IPC (instructions per clock). Huge caches (which as he says are easy to get right, and low-power) that account for a lot of the continuing increase in transistor count. And multiple cores, with hyper-threading, to go for overall throughput instead of single-threaded performance.

It seems like this is mostly what he seemed to be suggesting (or maybe predicting) in response to one of the questions at the end1, asking what else we could be doing with such high transistor counts.

1: The one where the asker notes that the trend for IPC/transistor has the exponential part in the denominator, which I thought was really funny but true.
posted by FishBike at 6:50 PM on December 21, 2011

Anyone know more of the back story?

The Pentium 4 was actually slower than the Pentium 3 at the same clock speed. It had a weird "marketecture" that emphasized clock speed over how fast it actually worked. They wanted the clock speed to be faster than the competing Athlon.

They ended up dumping that stupid design and adopting the core from the M mobile processor designed in Israel, which was closer to the old Pentium 3.
posted by bhnyc at 7:05 PM on December 21, 2011 [1 favorite]

This makes me so glad that I work for a small, family-owned company and it's been established that I have veto power over projects I think won't work.
posted by localroger at 8:10 PM on December 21, 2011 [1 favorite]

This looks very cool. I know that "what the CPU does with that linear string of instructions the compiler produced" is something that has gotten incredibly complicated over the last 10-15 years. I hope I'll be able to understand this.
posted by benito.strauss at 8:23 PM on December 21, 2011

They ended up dumping that stupid design and adopting the core from the M mobile processor designed in Israel, which was closer to the old Pentium 3.

They've kind of been tick-tocking on optimising for clock and then efficiency.

Meanwhile poor old AMD's Bulldozers appear to be suffering from a P4-size fail. I hope they survive it.
posted by rodgerd at 11:09 PM on December 21, 2011

x86 - the architecture not even a mother could love:

1:21:34: "... when we first did P6 none of us - with very few exceptions - none of us had ever done an x86 at all. And in fact many of us including me had studiously avoided the damn thing cuz it was ugly."

Amen to that.
posted by flabdablet at 1:05 AM on December 22, 2011 [2 favorites]

Oh for an orthogonal instruction set.....not that it matters these days anyway....

(pining for the 68k)
posted by Homemade Interossiter at 1:47 AM on December 22, 2011

That's really good, but I was disappointed that we didn't get the people story of why the Pentium Pro, which was his chip, sucked at 16-bit code when it came out. 16-bit code ran the world, and his design was weak in ways that were fixable, so there must have been a business or people reason for it.
posted by NortonDC at 7:19 AM on December 22, 2011

I think this nicely orthogonal architecture is about the only one with a hope of making inroads into the x86 pig-in-lipstick's CPU dominance any time soon. There will be a version of Windows for it for the phone market, and that just might be enough to encourage some brave soul to try making it into a desktop box.

I don't know whether reduced CPU power consumption compared to an x86 with equivalent throughput would actually be much of a sales driver, though; I think you'd need a low-power GPU as well to make enough of a difference to make a difference, and given what GPUs are being asked to do now I don't see that happening.
posted by flabdablet at 8:22 AM on December 22, 2011

I also hope AMD survives the bulldozer. And on that note, without Athlon64, the whole p4 clockspeed inflation would never have been upset to give us P-M and the Core line.

As he points out, no architecture ever survives server-only, which seems to be where AMD has been left.

I watched this whole video intently beginning to end. Great video and post, thanks.
posted by hellslinger at 9:56 AM on December 22, 2011

I'm hoping that the future holds some CPU/FPGA/GPU hybrids that will allow for more flexibilty (and the ability to improve design after shipping), ASIC like advantages for the things your computer always does, and FPGA advantages for things you do once in a while. I'm kind of curious if the pin count, cost and complexity of CPUs can ever be reduced by creating several wireless buses inside the case, but I suspect a very high speed optical bus(es) as a standard are a likelier possibility (what is going on with Light Peak/Thunderbolt anyway?) Not that I know anything about processor design. The bit about introducing delays was interesting (solution for race conditions / deadlocks?)
posted by BrotherCaine at 11:22 AM on December 22, 2011

The bit about introducing delays was interesting (solution for race conditions / deadlocks?)

It's kind of similar to race conditions and deadlocks, but not those exactly. What he was talking about was situations where the processor is still producing correct results, just more slowly than normal.

One example he gave (if I understand it right) is that when the P6 core is given some instructions to execute, it decides for itself exactly what order it's going to run them in and exactly when. Normally, this results in all the different parts of the chip working on stuff simultaneously.

But if just the wrong series of instructions comes along, it gets into one of these "slow patterns" where the chip schedules them to run in an order that results in them all spending a lot of time waiting for something another instruction is currently using. Kind of like a deadlock, except that eventually those resources will be freed, so it's not actually totally locked up. Once it starts doing that, it keeps on having these really bad wait times for each instruction sometimes for thousands of clock cycles before it clears up.

The other example he gave is that the Pentium 4 uses data from the level 1 cache before it has checked that it's actually the right data. The L1 cache is direct-mapped (I think), so for any particular data item, there's only one place in the cache that it can be stored (if it's there at all).

To use a decimal example, imagine we've got a cache big enough for 100 entries. We use the last two digits of the memory address to determine the cache location. So data from memory location 1234 will be in cache entry 34. But that could contain data from memory location 5634 instead, so there are "tags" for each cache entry to track where the data in there is from.

So if the Pentium 4 gets an instruction to do something with data from location 1234, it'll grab the data out of the L1 cache entry 34 and start working on it right away. Separately (while it is already working on the instruction), it checks the tag to see if that's the right data.

And then maybe, oops, finds out it's not--cache entry 34 contains the data from memory address 9934 instead, so the instruction it's already working on has the wrong data and needs to be thrown out. When that happens, it schedules the instruction to run again (and I guess schedules a fetch from memory to get the right data into the cache).

The second type of "slow pattern" he talks about in the video is that the chip ends up spending a lot of its time working on these "try again with the right data this time" re-scheduled instructions, and they compete for resources with new instructions coming in. It spends too much time re-trying the same instructions over and over and doesn't get a lot of useful work done, and this also can take hundreds or thousands of clock cycles to get better.
posted by FishBike at 6:58 PM on December 22, 2011

The idea of introducing delays at random is that it should be possible to do so in a way that doesn't cost throughput, because delays do occur kind of randomly anyway due to resource conflicts, and if the deliberate random delays are designed carefully, their usual effect would be to turn sequences like work1-work2-work3-delay into delay-work2-work3-work1. The point of doing this is that it might mix things up enough to make it less likely for the processor to get stuck in pathological patterns of interlocking resource conflicts.
posted by flabdablet at 7:11 PM on December 22, 2011

Does anybody know who the networking company that said "we don't need anything faster than 386" was? One of the audience members said something that sounds like "but they're no longer in Utah?" presumably implying which company it was.
posted by TwoWordReview at 5:02 PM on December 23, 2011

Networking + Utah = Novell? Novell is actually back in Provo, UT now.
posted by BrotherCaine at 9:12 AM on December 24, 2011 [1 favorite]

I think this nicely orthogonal architecture [ARM] is about the only one with a hope of making inroads into the x86 pig-in-lipstick's CPU dominance any time soon.

ARM is going to own the Windows Phone 8 market, and the Windows 8 tablet market. I can't see them caring much about the moribund desktop market.
posted by Rat Spatula at 6:52 AM on December 26, 2011

the moribund desktop market

is not a call I'd be willing to make any time soon.

I do understand that the rising wave of digital cargo cultists natives is fiercely in love with connectivity that can be carried in pockets and packs, but it seems to me that the idea that tablets and smart phones are on the cusp of replacing desktop computers is just about as unlikely to be true as the older idea that computers will soon usher in the era of the paperless office and universal work-from-home.

And I understand that many see real physical keyboards as the shuffling zombie remains of the era of the mechanical typewriter.

And yes, I'm very nearly 50 years old and yes, that does make me set in my ways to some extent. But the flip side of being old and crusty is experience. I have seen a lot of technological fads come and go, and I have yet to see a popular UI with the staying power of the WIMP interface first popularized by the Mac and subsequently refined by Microsoft with Windows 95. I think very few of today's young 'uns have any idea at all of the enormous cultural inertia represented by the masses of people skilled in the use of that interface.

The simple fact is that for doing serious work at any length, a big screen or several big screens set at an ergonomic height and distance in front of you is objectively superior to a little handheld one, a 105-switch keyboard is a richer and more comfortable control than its simulation on a touch screen, and being able to use a hand-sized desktop puck to move a small, non-obscuring pointer about without needing to raise one's hand to the screen is easier, quicker and cleaner than stabbing at a touch screen with fat fingers.

Desktop computing is, to my way of thinking, in no way a moribund market. I am sure that very few homes that currently have desktop computers will abandon them for tablets, and that desktop computers will continue to account for the overwhelming majority of IT-related workplace activity, for at least the next ten years. But energy efficiency is going to get more important, and circuitry is going to get smaller; I fully expect to see hardware architectures that started out inside phones and tablets migrating to the desktop, and it will be interesting to see whether x86 can maintain its dominance in the face of that kind of upward drift.
posted by flabdablet at 3:19 AM on December 27, 2011

But the desktop market is unlikely to grow very much - nothing like the phone/tablet/slate market. And isn't the main point of the desktop horsepower? Why would I want a desktop with an energy-efficient CPU, versus just a docking station for my phone that provided me with desktop-style KVM? I already do all my coding and testing on a laptop with a quad-core CPU and a fat SSD, and I rarely use it as a laptop - I just dock it, or use Remote Desktop over a LAN. I don't think the desktop experience is necessarily disappearing, but the justification for the ATX-motherboard form-factor is starting to weaken.
posted by Rat Spatula at 12:19 AM on December 28, 2011

« Older Touched by a Wild Mountain Gorilla   |   Last Donut of the Night Newer »

This thread has been archived and is closed to new comments