Join 3,559 readers in helping fund MetaFilter (Hide)


things that seemed like universal laws to people at the time
September 24, 2013 12:48 AM   Subscribe

The Slow Winter [PDF]. A slightly true story about CPU design.
posted by aubilenon (46 comments total) 41 users marked this as a favorite

 
As a software writer and electrical engineer, this is a god damn tragic story.
posted by ryanrs at 1:33 AM on September 24, 2013 [2 favorites]


This is glorious, and particularly poignant for people who where born after all the Easy Stuff was already done, dammit.
posted by Dr Dracator at 1:40 AM on September 24, 2013 [2 favorites]


SUMMARY WITHOUT ALL THE SNARKGASM:
We believed that Moore's Law was inviolate for decades, but eventually the laws of physics themselves stopped us from making still smaller and faster chips. We tried various other things to continue to make chips better and better: more cores, fault-tolerant software, less power usage, but it either doesn't work very well or isn't as compelling a sales gimmick as MOAR MOAR FASTAR FASTAR. We have no solution at this time.
posted by JHarris at 2:31 AM on September 24, 2013 [6 favorites]


But something something quantum something something nanotube something something!
posted by cthuljew at 2:39 AM on September 24, 2013 [1 favorite]


here is the money quote:

Of course, lay people do not actually spend their time trying
to invert massive hash values while rendering nine copies of
the Avatar planet in 1080p. Lay people use their computers
for precisely ten things, none of which involve massive computational parallelism, and seven of which involve procuring
a vast menagerie of pornographic data and then curating that
data using a variety of fairly obvious management techniques,
like the creation of a folder called “Work Stuff,” which contains an inner folder called “More Work Stuff,” where “More
Work Stuff” contains a series of ostensible documentaries that
describe the economic interactions between people who don’t
have enough money to pay for pizza and people who aren’t too
bothered by that fact. Thus, when John said “imagine a world
in which you’re constantly executing millions of parallel tasks,”
it was equivalent to saying “imagine a world that you do not and
will never live in.”
posted by empath at 2:56 AM on September 24, 2013 [1 favorite]


I thought the moral of the story was "don't run Java."
posted by jdfan at 2:59 AM on September 24, 2013 [2 favorites]


Thus, when John said “imagine a world
in which you’re constantly executing millions of parallel tasks,”
it was equivalent to saying “imagine a world that you do not and
will never live in.”


...or "imagine you're a gamer playing something with a well-optimised engine".
posted by jaduncan at 3:03 AM on September 24, 2013 [5 favorites]


We believed that Moore's Law was inviolate for decades, but eventually the laws of physics themselves stopped us from making still smaller and faster chips.

I haven't plotted recent chips, but I think Moore's Law is still going strong. Transistors are getting smaller. FLASH storage is finally becoming a common thing on desktops. Individual cores aren't getting much faster, but Moore's Law was never about that.
posted by ryanrs at 3:17 AM on September 24, 2013


"imagine you're a gamer playing something with a well-optimised engine"

So, the entirety of the Planetside 2 player base then. Only we'd settle for optimized, instead of well-optimized.
posted by Slackermagee at 3:23 AM on September 24, 2013


Imagine you're not a gamer, though. Quite a lot of people aren't.

Massively parallel is generally useful, for processing real-world data in real time into helpful models. That's how we do it in our brains. Whether we get there in processors before we run out of Moore's Law (2023? Something like that, anyway. Assuming not something something nanotunes something), is an exercise I guess we'll go through round... about... now.
posted by Devonian at 3:34 AM on September 24, 2013


Moore's Law

it's a conjecture, not a law
posted by thelonius at 3:35 AM on September 24, 2013 [1 favorite]


The solution is to run many instances of an older game on your new multicore computer.
posted by ryanrs at 3:35 AM on September 24, 2013 [4 favorites]


I haven't plotted recent chips, but I think Moore's Law is still going strong

The SoC in the XB1 or the PS4 have something like 5 billion transistors, which fits the graph pretty well.

Let me just say that again to emphasize how insane it is. 5 billion. That is more transistors than you could even count with a register of a consumer CPU from 12 years ago.
posted by aubilenon at 3:37 AM on September 24, 2013 [4 favorites]


And as far as multiple cores go, code is getting more and more functional, stateless and asynchronous (between things like Node.js, Scala, NoSQL, blocks/GCD on OSX/iOS and such) so the idea that massive parallelism is only for esoteric computational applications is increasingly a straw man. Today's networked GUI apps can easily juggle several cores of activity, and new code is increasingly written so that resource-intensive tasks are decomposable to other cores, if not other servers. A lot of this is undoubtedly a reaction to the limits of Moore's Law, but it has also been the common wisdom for a while that it's a lot cheaper to buy 10 duck-sized horses small machines than one horse-sized duck machine that's 10 times as powerful. Not only that, but if requirements go up and down, one can scale, and can indeed rent resources from providers like AWS and Rackspace in a pinch.
posted by acb at 3:41 AM on September 24, 2013 [3 favorites]


Let me just say that again to emphasize how insane it is. 5 billion. That is more transistors than you could even count with a register of a consumer CPU from 12 years ago.

And you could put a 486 in 0.1% of the silicon of one of those. That's practically the margin of error.
posted by acb at 3:44 AM on September 24, 2013


I like that at the end of the day the show-stopper is processor design is the plain, old-fashioned, engineering problem of how to get the heat out.

"One of the reasons for this stagnation is that current integrated circuits have reached physical limits of power density, generating as much heat as the chip package is able to dissipate, and consequently hardware designers have had to limit frequency increments. It is true that Intel has never sacrificed performance for power efficiency, but now physical consequences leave them with no option but to look carefully at power consumption."
posted by three blind mice at 4:00 AM on September 24, 2013 [1 favorite]


I'm not sure I understand the author. He seems to be saying that prospects are grim, like a long cold winter, because a) the low-hanging fruit has been picked clean, and b) because of fundamental constraints on computational power that have to do with entropy or whatever. But there is a big difference between the dying of an industry, and the maturity of a science or technology. And indeed from the point of view of society he points out the pervasive misutilization of computing (p15 last paragraph) and/or underutilization (final paragraph)—in these senses, we have barely gotten started with this stuff! The question is whether researchers in these times and conditions are able to formulate the socially and technologically relevant problems. As it is said, when the going gets tough…
posted by polymodus at 4:19 AM on September 24, 2013 [1 favorite]


Well, heat and the related issue of leakage current. All real transistors leak current when off (a theoretical transistor does not), but when you have very tiny transistors with infinitesimally thin insulator layers and pack a few billion of them onto the same die, then you get a a lot of power wasted while nothing's happening. That's why modern processors have lots of separate power domains, with increasingly fine divisions of functionality being powered off in between uses. All fine and good, but you can't do very much with a circuit that's got no power applied.

Back in the 80s, the limit to processor design was 'the hairy smoking golfball', which is to say a sphere for maximum surface area, hairy for maximum pin-out, and smoking for maximum heat dissipation. Hasn't quite worked out that way, but it's still being approached.

One wild card is interconnect speed. Intel, IBM and others are unusually interested in silicon photonics, where you combine electrical and optical processing on a chip. That's not so much because you can do much actual processing in light, at least not very usefully (the size of the components are massively more than electrical transistors), but because you can build chips in ordinary (well, sorta) fab plants with optical interconnects that can move stuff over long distances at the same sort of speed as you get on-chip. Which has all sorts of implications for the locality of your processing and storage, and which implies there may be other solutions than trying to keep your processing within the sort of design limits implicit in a local processor. Especially so with the model of lots and lots of lightweight processes able to flit about while still communicating usefully with each other and with you.

However, I don't think there's a single clear model of what will happen next after the currrent Si roadmap runs out. Which nobody really doubts it will, due to atoms only really coming in integral numbers.
posted by Devonian at 4:28 AM on September 24, 2013 [6 favorites]


Edgar Allan Poe.
posted by mittens at 5:01 AM on September 24, 2013 [1 favorite]


sphere for maximum surface area

Not sure exactly what this means. Generally I think of spheres as minimizing surface area (e.g. surface tension shaping a water drop).
posted by ryanrs at 5:01 AM on September 24, 2013 [2 favorites]


One, I loved the ADHD writing style - Dave Barry does hardware design.

Two, once the easy goals have been met is where the rubber meets the road. Take, for instance, the engine in a modern family sedan. It gets 30mpg, puts out close to 300hp, very little combustion byproduct (smog) and will last for a quarter million miles or more with minor regular maintenance.

This wasn't only inconceivable during the "malaise" of the '70s-early-'90s, this was inconceivable in the muscle car heyday of the 60's. The low hanging fruit of big displacement culminated in the Caddy 502cui and MoPar Hemis, but it was refinement, not brute force breakthroughs, that brought us to where we are now - the most powerful, cleanest, most reliable, most fuel efficient engines that have ever been built - just in time to be replaced by electric vehicles over the course of the next decade or two. Direct injection, electronic ignition, new oil formulations, advances in metallurgy and computer automation... nibble nibble nibble at the problem, and you get a Toyota four-door that can outrun a '70 Boss 302.

So it will be with computer hardware. Advances will be incremental, nibbling away at the edges - there's a ton of stuff software needs to be doing to optimize itself, for one, and hardware can and will help.
posted by Slap*Happy at 5:01 AM on September 24, 2013 [5 favorites]


I will now smugly return to designing embedded gizmos around the Parallax Propeller.
posted by localroger at 5:45 AM on September 24, 2013 [2 favorites]


I lost it at the "touring with Aerosmith" line. If only.
posted by tommasz at 6:58 AM on September 24, 2013


I haven't plotted recent chips, but I think Moore's Law is still going strong.

As is Moore's second law -- the difficulty and expense of fabrication technology is outpacing the cost savings of shrinking dies. We hit a rough patch once the next generation technology costs 2x per area.
posted by eddydamascene at 7:14 AM on September 24, 2013


imagine you're a gamer playing something with a well-optimised engine

Imagine the kinds of creativity in game design and art direction required to WOW gamers if we were "stuck" for 10 or 20 years with essentially the same level of hardware power.
posted by straight at 7:20 AM on September 24, 2013 [1 favorite]


Let me just say that again to emphasize how insane it is. 5 billion. That is more transistors than you could even count with a register of a consumer CPU from 12 years ago.

I'm sorry, but that's kind of a silly comparison. You could build machines with register size of 2 bits and still do significant computations of very large numbers.
posted by CheeseDigestsAll at 7:25 AM on September 24, 2013 [1 favorite]


... Multiply vast, unfathomably dimensioned matrices in a desperate attempt to unlock eigenvectors whose desolate grandeur could only be imagined by Edgar All[a]n Poe.

I will never look at work the same way again. Unfathomably dimensioned matrices, eigenvectors with desolate grandeur.

[Neo voice:] Whoa.
posted by RedOrGreen at 7:37 AM on September 24, 2013 [1 favorite]


a dystopian bin-packing problem in which humans, carry-on luggage, and five dollar peanut bags compete for real estate while crying children materialize from the ether and make obscure demands in unintelligible, Wookie-like languages while you fantasize about who you won’t be helping when the oxygen masks descend

I don't know about processor design, but I do know I like this guy's style
posted by ook at 7:43 AM on September 24, 2013 [3 favorites]


acb: "And as far as multiple cores go, code is getting more and more functional, stateless and asynchronous (between things like Node.js, Scala, NoSQL, blocks/GCD on OSX/iOS and such) so the idea that massive parallelism is only for esoteric computational applications is increasingly a straw man."

Actually, despite being asynchornous and finctional(ish), JavaScript runs in a single thread on a single core.

JavaScript's funactional/asynchronous nature gives us some features that theoretically make it (somewhat) easy to fork and create subprocesses, but it's by no means built into the language. There's still a lot of debate about the "correct" way to do clustering in Node.

A lot of the parallelism hurdles that we've run up against stem from the fact that threads are inherently difficult to work with, and our existing languages/toolkits haven't done much to make that easier for us.

Hell, nearly 20 years later, most modern OSes still don't do multithreading as well as BeOS did in 1995. There are algorithms that don't scale well across cores, but an awful lot of existing code is still unnecessarily single-threaded.
posted by schmod at 8:07 AM on September 24, 2013 [1 favorite]


Individual cores aren't getting much faster, but Moore's Law was never about that.

No, but it was a rather nice corollary to Moore's Law that allowed old FORTRAN matrix-inversion code written in the 1970s to run much, much faster today with hardly any modification. When people in the scientific circles I sometimes hang out in talk about the end of Moore's Law, that's what they mean--the free lunch is over. There's no more automatic gains to be had just by making individual cores faster. If you want more speed, you'll have commit tons of resources towards rewriting and parallelizing your code (and debugging it, and testing it...) if it's even possible to parallelize your algorithms in the first place.
posted by RonButNotStupid at 8:21 AM on September 24, 2013 [4 favorites]


optical interconnects that can move stuff over long distances at the same sort of speed as you get on-chip

Electrical signals propagate at about 2/3 C. It's not at all clear to me that an extra 50% for any given propagation delay counts as "long distances".
posted by flabdablet at 9:42 AM on September 24, 2013


this was delightfully hilarious.
posted by quonsar II: smock fishpants and the temple of foon at 9:52 AM on September 24, 2013


The low hanging fruit of big displacement culminated in the Caddy 502cui and MoPar Hemis, but it was refinement, not brute force breakthroughs, that brought us to where we are now - the most powerful, cleanest, most reliable, most fuel efficient engines that have ever been built - just in time to be replaced by electric vehicles over the course of the next decade or two.

This exactly recapitulates what killed the great game Car Wars, which went all weird and awful when it bloated up to include airships, boats, and gas engines.
posted by wenestvedt at 11:22 AM on September 24, 2013


flabdablet: Imagine clock trees that need to remain synchronous, where by "synchronous" you mean "gets to places as far as 500 micrometers away within 0.05 nsec of each other." Then realize that with an electrical interconnect, at the places where it crosses, say, another interconnect on another metal layer, there will be an extra parasitic capacitance which will change the RC time delay as compared to the other interconnect route that doesn't cross the same metal layer at a similar location. Then realize that we're talking IC fabrication processes with five or more metal layers, sometimes, signals going everywhere.

Optical routing adds its own issues, and it's definitely not a panacea, and maybe it's going to be worth more off-chip than on. But it doesn't add geometrical design complexity with an eye out to keep routing delays the same, and that is a big, big wrinkle in what is already a very, very wrinkly problem in automated design. Just to give a sense, a well-known IC CAD software company charges almost as much for their parasitic inductance/capacitance extractor module as they charge for the base software.

I have a friend who's a theoretical physicist, and who sometimes complains about how all the low-hanging fruit in physics has been picked years ago. I dabbled on the shores of IC design for my dissertation. I read the article with a bittersweet smile on my face.
posted by seyirci at 11:54 AM on September 24, 2013 [2 favorites]


The wild ride of being able to just make Fortran IV code run faster by waiting for a new computer is over... but there's lots to be done if you can parallelize it, and don't require excessive locality of data.

Now that this phase of things is ending, a ton of research is going to go into things like STM - Software Transactional Memory, which when incorporated into compilers, will allow everything to be threaded and distributed without having to re-write it.

Moore's law was about the complexity of chips, not speed. Once the software research is done, I think we'll start getting more speed again, but at less of an upward slope.
posted by MikeWarot at 12:20 PM on September 24, 2013 [1 favorite]


For a more concrete take on this idea, read Herb Sutter's The Free Lunch Is Over and the followup article Welcome to the Jungle.
posted by Rhomboid at 1:00 PM on September 24, 2013 [1 favorite]


I wonder what the author thinks of the clockless 144 core 18-bit FORTH processor that runs at peak 96 billion operations/sec at 650 milliwatts.
posted by RobotVoodooPower at 2:35 PM on September 24, 2013 [2 favorites]


There was some discussion of the GA144 over in the Parallax Propeller forums, since it's another mavericky design. The initial consensus was that it's going to be very difficult to use for some very common tasks due to its weird word size, which makes a lot of sense for the FORTH implementation but not much else in the real world.
posted by localroger at 3:07 PM on September 24, 2013


I would think that the GA144 Forth environment would be a lot more to overcome than a weird word size... hmmm.
posted by MikeWarot at 7:03 PM on September 24, 2013


Clockless?! That's very interesting, how does it manage to operate without the standard cycle procedure? Things have certainly advanced since the days I took Computer Architecture!
posted by JHarris at 8:00 PM on September 24, 2013


Oooooh reading about that chip fascinates me. I wish I had the time AND energy AND money to play around with it. Unfortunately, all those signals are low.
posted by JHarris at 8:07 PM on September 24, 2013


how does it manage to operate without the standard cycle procedure?

Aysnc processor designs still use something akin to a clock, in that each state-storing logic element will have an input whose sole job is to prevent the output changing until other inputs are guaranteed stable; but instead of being derived from a global clock source, those inputs are driven from Ready signals emerging from nearby logic. This is roughly the hardware equivalent of event-driven software; inter-element coordination ends up being defined by a network of interlocks rather than a global tick.

Traditional synchronous clocks are actually something of a kludge. They're essentially the hardware equivalent of inserting sleep(10) delays in multi-threaded software until it stops crashing instead of doing robust and provably correct inter-thread synchronisation with mutexes. Just as in the software case, they simplify designs and let you do more with less, but they're a bit fragile: every piece of logic needs to be able to do its thing within some maximum time defined by an essentially arbitrary clock rate, lest a clock edge happen before outputs are ready and things fall over. As seyrici hints, this can become hard to guarantee when the clock generator is relatively physically remote from the logic in question.

Because an async design's Ready signals are derived from logic that's physically very close to that whose states it validates, it's much easier to guarantee that any given logic output will always be valid before its associated Ready signal says it is. The physical downside is that most of the logic involved in generating Ready signals comes down to chains of delay-generating elements, and these can end up consuming a fair bit of real estate. The design downside is that async designs are essentially a fiendish cesspool of race conditions that's really, really hard to get right.

You still end up with something that looks a lot like the fetch/decode/execute cycle from a traditional clocked processor, but instead of that cycle happening at a speed defined by a clock source it happens as fast as internal propagation delays allow. Reducing an async design's temperature to reduce internal propagation delays will make the whole thing run faster, as opposed to simply letting it run faster via overclocking.

In the case of the GA144 discussed above, some of the Ready signals can come directly from external ports. The result is that if a port's not ready, that unreadiness can propagate all the way through the network and naturally suspend any processing logic that's waiting for data from that port. There's no need for a Sleep mode as such, because there isn't a clock to turn off; when there's nothing to do, it just naturally follows that nothing happens.
posted by flabdablet at 9:30 PM on September 24, 2013 [8 favorites]


Oh, that's incredibly fascinating. I wish I could play around with circuit and logic design again. I wired up a four-bit multiplier of my own design back at university, and then pretty much put it down and never went back to it. Sigh.
posted by JHarris at 11:16 PM on September 24, 2013


One reasonably sound approach to asynchronous design is to take an existing, well understood synchronous one and systematically asyncify it.
posted by flabdablet at 3:47 AM on September 25, 2013 [2 favorites]


Wikipedia's writeup is good reading too.
posted by flabdablet at 3:58 AM on September 25, 2013 [1 favorite]


Here's a fine-grained approach that incorporates Start and Done signals all the way down to the basic gate level.
posted by flabdablet at 4:08 AM on September 25, 2013 [1 favorite]


« Older "all we could hear was screaming and shooting."...  |  A new academic paper digging i... Newer »


This thread has been archived and is closed to new comments