"Just now. We're at now now."" Go back to then.""When?""Now."
March 23, 2015 8:15 AM   Subscribe

 
So the future isn't now. Huh.

I know this is talking generally about distributed systems and the specific software as examples, but I can't help but think about some of the Golden Age science fiction I read where the "nows" and calculations were 'precise' down to the millisecond with human backups - yet that's still a very imprecise measure of the nows and wheres and whens and thens.
posted by tilde at 8:21 AM on March 23, 2015 [2 favorites]


Great, clear read on one of the core problems of distributed systems. I'm going to file this along with Waldo's A Note on Distributed Computing as ammunition for fresh-out-of-college engineers who are convinced they can easily build a reliable distributed system.
posted by Nelson at 8:26 AM on March 23, 2015 [1 favorite]


Now you tell me.
posted by dances_with_sneetches at 8:29 AM on March 23, 2015 [1 favorite]


Back around the turn of the millenium, I wrote the core of a distributed "web scale" (for "over a million queries a day!" values of) database. I've since worked on systems with microsecond-ish synchronization.

I didn't really understand race conditions and synchronization issues until I started to work on a system which manages workflows on mixed human and computer processes that happen over days, and where synchronization to an hour or two is the best we can hope for.

Queued these up for deeper reading and dissemination to cow-orkers... Who will sigh knowingly.
posted by straw at 8:32 AM on March 23, 2015 [3 favorites]


Reminds me of the ol' 500 mile email yarn
posted by thelonius at 8:38 AM on March 23, 2015 [23 favorites]


Change and identity over time is a hard problem.

Sometimes I think that terrible distributed system design is partially caused by something very much related to pereidolia (perhaps it has a name, is it a sort of paranoia or magical thinking?). The default for a confused human mind is to see combinations of events and properties as coherent entities, as "objects" if you will. And in the absence of precision, this illusion can lead to assuming or falsely inferring all sorts of impossible consistency, and impossible coordination of state.

I am more and more a fan of immutable systems, or at the very least systems where immutability is a default, and the ability to make things change is considered special, and surrounded by flashing lights and warning buzzers.
posted by idiopath at 8:38 AM on March 23, 2015 [10 favorites]


Everyone knows, New York Is Now!

though that was in 1968 so it may not still be now
posted by kokaku at 8:38 AM on March 23, 2015


Good piece.

I think the key understanding that most people shy away from is that at some point the software abstractions will break down, and the most important thing you can do is figure out how you want to fail when that happens.

Know your failure mode.
posted by PMdixon at 8:43 AM on March 23, 2015 [3 favorites]


Awww, I was gonna bring up general relativity but the article smacked me down right out of the gate.
posted by XMLicious at 8:47 AM on March 23, 2015 [3 favorites]


PMdixon: well "failure modes" is a very broad category, and our stack of abstractions is very deep. On a single computer, I would never double check that a variable was equal to itself (aside some floating point silliness). On a distributed system, all sorts of things that are implicit common sense on a single computer become very shaky, if not flat out wrong or impossible.
posted by idiopath at 8:54 AM on March 23, 2015 [1 favorite]


This is why we use vector clocks! Well obviously they don't solve every problem in the world, but they are good enough for eventual consistency in key-value stores which is all I care about right now.
posted by miyabo at 9:10 AM on March 23, 2015 [1 favorite]


idiopath: "I am more and more a fan of immutable systems, or at the very least systems where immutability is a default"

How much is Rich Hickey paying you, be honest now
posted by vanar sena at 9:16 AM on March 23, 2015 [3 favorites]


It reminds me of this attempt to create a virtual concept of time in which another machine can recover from a hardware fault by resuming from the the point of failure as if no time has passed.
posted by Obscure Reference at 9:33 AM on March 23, 2015 [1 favorite]


Awww, I was gonna bring up general relativity but the article smacked me down right out of the gate.

Relativistic effects do matter a little bit to practical computing, specifically for GPS calculations. That linked Wikipedia article says special relativity amounts to +7 μs/day and general relatively is -46 μs/day, or enough to introduce 10 km/day errors in positioning if it weren't corrected.

Not disputing the article's primary claim which is that of all your errors in timing on a computer, relativistic effects are the least of your worries. Also the GPS one is completely predictable and corrected, so it's not even really an error to GPS users. I just like that GPS is a practical application of relativity.
posted by Nelson at 9:44 AM on March 23, 2015 [6 favorites]


Interesting that cites the Zookeeper Call Me Maybe results to aphyr, but thanks Kyle Kingsbury in the acknowledgements. (Kingsbury also gets a citation for his ACM Queue post "The Network is Reliable".) Identity management really is tricky.
posted by kenko at 9:46 AM on March 23, 2015 [1 favorite]


Just divide your time units into 10^-43rds of a second and figure out how to get information to travel faster than the speed of light. Easy.
posted by sexyrobot at 10:14 AM on March 23, 2015 [2 favorites]


Fascinating. We didn't even need standard time at all until we had trains.
posted by MrMoonPie at 10:18 AM on March 23, 2015 [1 favorite]


The great thing about Kyle Kingsbury AKA aphyr is that his twitter is a perfect mix of gay bdsm selfies and cutting edge distributed systems knowledge.

There is a very small group of people that love both of these things, and the rest of us put up with it because it's worth it.
posted by idiopath at 10:24 AM on March 23, 2015 [8 favorites]


The great thing about Kyle Kingsbury AKA aphyr is that his twitter is a perfect mix of gay bdsm selfies and cutting edge distributed systems knowledge.

Ohhhhh, so that's what Twitter is for.
posted by PMdixon at 10:37 AM on March 23, 2015 [4 favorites]


Now that's how you derail. [Heads to Twitter]
posted by sexyrobot at 10:41 AM on March 23, 2015 [4 favorites]


Spanner is pretty interesting in that one of its core design tenets is that you can't abstract everything away - sometimes the most basic building blocks, time, have to get expose all the way back to the top. The notion that you can abstract away everything and have this idealized storage system is discarded from the very beginning. There's no understanding the top of the stack without understanding the bottom.

And, over time, the stack only gets taller.
posted by GuyZero at 10:44 AM on March 23, 2015


PMdixon: well "failure modes" is a very broad category, and our stack of abstractions is very deep. On a single computer, I would never double check that a variable was equal to itself (aside some floating point silliness). On a distributed system, all sorts of things that are implicit common sense on a single computer become very shaky, if not flat out wrong or impossible.

My experience/scars are from working environments where questions like "if we get a bad clock pull do we want the code to try to keep running (and scream) or die (and scream)" got answers of "well that can't happen", so I'm really talking at a pretty basic level here. Yeah, lots of conceivable failures are both actually once in the lifetime of the universe stuff and unmitigable, but lots are quite baked into and predictable from whichever architecture, and people have a shockingly hard time even understanding that you can think about those, let alone that you have choices to make.
posted by PMdixon at 10:49 AM on March 23, 2015


I'm glad that he acknowledges the trade-offs of perfection versus performance and how that's only now starting to show up in distributed systems research. My work is with distributed systems that are much smaller, both in count and in processing power, than what typically gets studied. Sometimes I understand what I should do but decide to implement a more fragile system anyway because that's what the engineering demands. Rare is the day when I don't wish for more code space, a bigger power budget, etc. I'd like to see more research on those trade-offs, because the trend is definitely for devices with the ability to talk to one another and thus coordinate.
posted by introp at 10:51 AM on March 23, 2015


You don't put a seatbelt on a bicycle, and you don't wear a helmet in your car. What it means to have reliability changes with scale and with the constraints of the domain.

The kind of network failure that is rare in a 2 node system is relatively frequent in a 1000 node system. Sometimes trading a 10x throughput boost for a 10% chance that you drop a message is totally worth it, sometimes a 10x drop in throughput in order to change your failure rate from 1 in 1000 to 1 in 10000 is worth it.
posted by idiopath at 11:16 AM on March 23, 2015 [1 favorite]


The great thing about Kyle Kingsbury AKA aphyr is that his twitter is a perfect mix of gay bdsm selfies and cutting edge distributed systems knowledge.

You'll come for the gay bdsm selfies but you'll stay for the cutting edge distributed systems knowledge.

Or was that the other way around?
posted by ZenMasterThis at 11:17 AM on March 23, 2015


I think a seatbelt would actually diminish the safety of a bicycle.
posted by kenko at 11:25 AM on March 23, 2015 [1 favorite]


Or as my grandmother used to say to calm me down when I was a manic 5-year-old: "Now, now..."
posted by oneswellfoop at 11:31 AM on March 23, 2015


I think a seatbelt would actually diminish the safety of a bicycle.

I'd be happy to have an ejector seat, or a puffer-fish style way of instantly surrounding myself with airbags.
posted by justsomebodythatyouusedtoknow at 11:53 AM on March 23, 2015 [2 favorites]


I deal with this issue often enough that it's why comments of are order out of mine sometimes. The interesting thing is that, from a CS perspective it's true that you can't really universally solve this problem, and that means a certain class of systems are not feasible, however, from a business perspective, you can deal with it easily because it's just something else that can go wrong which you can create procedures for, which enables you to get a lot of the value out of assuming things make sense without pinning your entire system to the mathematical certainty that they do.
posted by feloniousmonk at 11:59 AM on March 23, 2015


Just divide your time units into 10^-43rds of a second and figure out how to get information to travel faster than the speed of light.

That would create more problems because at that resolution you would have to deal with general relativity effects—the server at the top of the rack not only would have clock differences from the one at the bottom due to temperature differences changing the operation of the hardware, but because it actually experiences a different amount of time passing, the way the GPS satellites do but on a smaller scale. Your head is younger than your feet! And the geology underneath a location affects the rate at which time passes there, insofar as it affects gravity.
posted by XMLicious at 12:31 PM on March 23, 2015


> I think a seatbelt would actually diminish the safety of a bicycle.

I'd be happy to have an ejector seat, or a puffer-fish style way of instantly surrounding myself with airbags.

This is a thing, btw.
posted by XMLicious at 12:38 PM on March 23, 2015


Your head is younger than your feet!

Actually I stand on my head 12 hours a day specifically to even out the aging process.
posted by kenko at 1:04 PM on March 23, 2015 [2 favorites]


Well then you have youthful bowels and naughty bits.

There's a reason for humanity to establish a presence in outer space: so that we can guarantee evenly-aging bodies.
posted by XMLicious at 1:37 PM on March 23, 2015


Now I'm trying to translate agile into bdsm terms, or possibly vice versa.
posted by PMdixon at 2:36 PM on March 23, 2015 [1 favorite]


From the article:

11.8 inches long, the maximum distance that electricity can travel in one nanosecond

Twaddle - should be: 30cm long, the maximum distance that light can travel in a nanosecond.

Something the same length as the distance electricity can travel in a nanosecond would be too small to be visible. Electrical signals can travel significantly faster, but that depends on the velocity factor of the medium.
posted by HiroProtagonist at 7:41 PM on March 23, 2015


Needs more Lamport. I'm not sure what Sheehy is bringing to the discussion that hasn't already been covered well in college level distributed computing classes. Article has no algorithms, proposes no new clock models, and has no math regarding the scalability of various coordination systems.

It's not bad or wrong, it's just not clear why the ACM is publishing it. I don't see anything a practitioner gets out of the article.
posted by Mad_Carew at 7:00 AM on March 24, 2015 [1 favorite]


Something the same length as the distance electricity can travel in a nanosecond would be too small to be visible. Electrical signals can travel significantly faster, but that depends on the velocity factor of the medium.


I disagree, I think that when people use the term 'electricity' they are using it to mean electrical signals, rather than to discuss the motion of individual electrons (i.e 'turn on the electricity' isn't parsed the same as 'arouse these leptons'). As that's the case, 11.8 inches is the maximum distance electricity can travel in a nanosecond.
posted by Ned G at 7:12 AM on March 24, 2015


Mad_Carew: ACM has seen its membership and subscription base dwindling. It has been making an active attempt to reach out to a new generation of programmers that are less formal, and less academic, than their former audience. It's going to be a trade off for sure, but better to meet the node.js and mongodb crowd where they are at (including accessible descriptions of the mistakes they are making, and hints of prior research they could benefit from) than be forgotten and the mistakes being made regardless.
posted by idiopath at 7:49 AM on March 24, 2015 [1 favorite]


ACM Queue is their popular magazine. It isn't intended to be a research journal. It's like IEEE Spectrum, but it's way more boring and nobody reads it.

All the other ACM journals are for peer-reviewed research though.
posted by miyabo at 7:59 AM on March 24, 2015


Of course there's a place for a high level summary article about the impossibility of time synchronization. Even for "practicioners" who have had "college level distributed computing classes". The practice in industry for distributed systems is just terrible.

We've just gone through 5+ years of distributed NoSQL systems that have been deployed in production code. Turns out that the entire industry is astonished to learn none of these systems have found a loophole in the CAP theorem. The more comical part is the way some of those system designers themselves didn't understand CAP. Literally every single distribute datastore has had basic, embarrassing problems where the data doesn't commit correctly, or everything falls apart when the network partitions, or else the distributed datastore has totally shitty performance under production network conditions. And everyone has the nerve to act surprised!

Regular programmers just don't understand how hard it is to build reliable distributed systems. Papers like this that explain why, at a high level, are invaluable.
posted by Nelson at 8:30 AM on March 24, 2015 [4 favorites]


Thumbs up for tags and title.
posted by Chrysostom at 9:56 AM on March 24, 2015


"11.8 inches is the maximum distance electricity can travel in a nanosecond."

This statement is not incorrect, but you could of course substitute anything else in place of 'electricity'.

My objection was that the writer purported to be quoting Hopper, but she was specifically referring to light, not electric signals, so the 'quote' was incorrect.
posted by HiroProtagonist at 6:06 PM on March 24, 2015


HiroProtagonist: she said electricity.
posted by JustinSheehy at 1:34 PM on March 27, 2015


« Older We landed at Baltimore, sat on the tarmac for a...   |   It Rains Fishes and Dancing Shrimp Newer »


This thread has been archived and is closed to new comments