Cloud Consitency
September 5, 2012 6:52 PM   Subscribe

Netflix has open sourced tools it uses for load balancing and failure management with Amazon Web Services . They plan to release more tools in the future. They are on Github.
posted by juiceCake (12 comments total) 25 users marked this as a favorite
 
Any post about Netflix's technology would be remiss to not mention the Chaos Monkey, which the second link mentions: an application that randomly tears through their infrastructure and pulls down entire instances (servers) on a whim. More details on their mischievous monkeys here.
posted by disillusioned at 7:55 PM on September 5, 2012 [4 favorites]


Having just had a developer resign, costing us an immeasurable amount of institutional knowledge, I wonder if the corporate world needs an HR version of the Chaos Monkey...

Randomly and abruptly notify employees that they will be given a three week paid vacation, mandatory and effective immediately. No contact may be made with coworkers unless absolutely critical (and if so, requiring a thorough follow-up explaining why such a single point of failure existed).

Ideally, this would keep folks coordinating and documenting their work, to make sure that no one person getting hit by a bus would bring a team (or an entire company) to its knees.

A well-run company has redundancy strategies for its hardware, it should have them for people as well.
posted by Riki tiki at 8:32 PM on September 5, 2012 [28 favorites]


Randomly and abruptly notify employees that they will be given a three week paid vacation, mandatory and effective immediately. No contact may be made with coworkers unless absolutely critical (and if so, requiring a thorough follow-up explaining why such a single point of failure existed).

I swear I heard a news story about a company that does exactly this, but the name escapes me...
posted by BungaDunga at 9:04 PM on September 5, 2012


I wonder if the corporate world needs an HR version of the Chaos Monkey

Actually, I know places that have policies like this. Every now and then, a random amount of "You're Dead" post-it notes get stuck on people and servers, with the remaining staff tasked to bring systems back up. As a sole sysadmin who works with people that have at least some technical knowledge, I keep meaning to "post-it note" myself one of these days.
posted by Nonsteroidal Anti-Inflammatory Drug at 9:05 PM on September 5, 2012 [1 favorite]


> As a sole sysadmin who works with people that have at least some technical knowledge, I keep meaning to "post-it note" myself one of these days.

I felt like the other side of this thought experiment is how long can you go without showing up to work before people notice you aren't there. If you've been automating the systems and using a management framework, it could be a couple of days or weeks if you're good.
posted by mrzarquon at 9:09 PM on September 5, 2012 [1 favorite]


I felt like the other side of this thought experiment is how long can you go without showing up to work before people notice you aren't there. If you've been automating the systems and using a management framework, it could be a couple of days or weeks if you're good.

Sounds like you need a job at Initech or Intertrode.
posted by special-k at 9:56 PM on September 5, 2012 [1 favorite]


A well-run company has redundancy strategies for its hardware, it should have them for people as well.

They're called layoffs and involve making the survivors work harder.
posted by Blazecock Pileon at 9:56 PM on September 5, 2012 [5 favorites]


The chaos monkey got the 's' in Consistency!
mrzarquon: I felt like the other side of this thought experiment is how long can you go without showing up to work before people notice you aren't there. If you've been automating the systems and using a management framework, it could be a couple of days or weeks if you're good
That's the problem. A great sysadmin automates themselves out of a job... but the corporate world doesn't understand that they should reward and encourage this behavior.

The result is most people end up embracing very slow, manual processes and poor documentation, because well, it keeps them employed. In most IT jobs, if you have to show up every day you're doing something wrong- but most people do it, because it's expected, just like when grandpa would show up to work every day in a suit and tie. They strive to look busy, or the wrong kind of people stay employed (mediocrity only recognizing and promoting other mediocrities), because there's no incentive to innovate solutions.

After all, even if you could, what is your motivation to solve in software problems that replace the work of 20 people- you certainly don't get even a reasonable fraction of that back added to your own salary for being so clever! Employers would rather you be a replaceable cog anyway, which is why periodically some PHB will think "We should have all our developers using the same language!" as if that was the important part.

Or as Blazecock notes, they just lay people off and you end up doing the work of several people anyway- I can't find it now, but there was a much-linked set of graphs and charts a few months ago that talked about the "12 (or 25?) graphs that explain the economy" and among them were graphs showing that IT/technology has had far and away the fastest growth in productivity... but stagnant wages.

Anyway, getting back to the OP, I think these kinds of open source tools are ultimately fairly useless: the kind of people who adopt them well didn't really need them- i.e., they only use the tool because they figured out exactly what they needed and then realized someone had solved the problem, but could have and would have written it themselves if it didn't exist. For most people, they just hear "cloud solutions" and don't think any deeper than that, so most people won't even understand the value of something like chaosmonkey. They just got their boss to sign off on using AWS because "the cloud" is a magical buzzword that means someone else has somehow fixed all problems of high availability and fault tolerance for you.
posted by hincandenza at 12:19 AM on September 6, 2012


Can this software be used on any cloud systems other then amazon? If not, it being open sourced is kind of pointless. Why do you need open source software to use closed services?

Also, amazon actually has a ton of built in features for EC2 to do, I don't know about this exactly, but lots of redundancy, pooling, etc. I don't know all about it but they have stuff like elastic beanstalk which (supposedly) lets you just push an application into the cloud and have it run without needing to manage and maintain various instances. They also have a couple of database products that handle all instances for you.

I wonder if they're just open-sourcing this because it's becoming obsolete and no longer a competitive advantage.
The result is most people end up embracing very slow, manual processes and poor documentation, because well, it keeps them employed. In most IT jobs, if you have to show up every day you're doing something wrong- but most people do it, because it's expected, just like when grandpa would show up to work every day in a suit and tie. They strive to look busy, or the wrong kind of people stay employed (mediocrity only recognizing and promoting other mediocrities), because there's no incentive to innovate solutions.
I've always thought it would be an interesting thing to take IT people, take some non-tech industry, and create a new company with the IT people in charge where the software forms the central core rather then a collection of bolt-ons that eventually takes over under the 'leadership' of people who don't know tech, don't care, etc.

There are actually some people and companies that do this, in fact Netflix is kind of an example: there is nothing intrinsically 'computer-y' about renting movies, but at netflix they essentially wrote software to handle everything, and built the company around that.

I've heard of some other investors who tried this model and had some success (like with used car sales, or something)

Of course, everyone sees their particular job as being the most important, but with IT it really does seem like companies end up being run by software, usually just clobbered together hacks of mess that accumulate over the years.
posted by delmoi at 3:43 AM on September 6, 2012


They're called layoffs and involve making the survivors work harder.

Or just making people so miserable that they quit. Today is the last day of one of my teammates who's been there ten years and got fed up and quit in the middle of a product release.
posted by octothorpe at 4:42 AM on September 6, 2012


> A great sysadmin automates themselves out of a job... but the corporate world doesn't understand that they should reward and encourage this behavior.

The "great sysadmins automate themselves out of a job" trope gets me, because I think it is the good sysadmin the ends up doing that. The great sysadmin is able to communicate the benefits of what they have accomplished to the rest of the company so they appreciate the value of it. If 90% of the day to day tasks are automated, it means you are free to work on implementing new features, or responding to requests for change faster, so there is a value to the rest of the company.

Instead of spending their day patching an infinite backlog of systems or dealing with outages that could be prevented, great sysadmins are building more tools with the extra time they have. But this also means they have to interact with the rest of the company in the non stereotypical basement troll fashion that people assume all sysadmins are like.

Don't get me wrong, a great sysadmin is hard to find, there are plenty of good ones, and infinitely more bad ones who cover up for their lack of abilities by running around and putting out fires.

> Can this software be used on any cloud systems other then amazon? If not, it being open sourced is kind of pointless. Why do you need open source software to use closed services?

People write open source software for closed systems all the time. And AWS isn't some strange edge case service that only Netflix is using, I am sure there are plenty of other AWS heavy customers who would like to use these tools, or even just see how Netflix decided to implement their testing procedures.

And instead of trying to start a separate IT consulting business or sell these tools (both of which distract them from what they are doing internally), they can just push the code up to GitHub and let others use it instead.

Then there is the incentive to clean up ones code before you show it to others, just because it is good enough for production doesn't mean it is good enough for GitHub. That extra documentation and code cleanup to get these apps in a presentable fashion may be worth whatever competitive edge lost by sharing this code with other people. (Not to mention they haven't opensourced their secret sauce suggestions / recommendations systems, or really any of the applications that are actually running ontop of these EC2 instances that make Netflix a web streaming company instead of an e-retailer).
posted by mrzarquon at 7:57 AM on September 6, 2012 [1 favorite]


The code may not allow a competitor to their business to come up on them (legacy and new deals with content owners are probably much harder to get than writing a recommendation system - after all they didn't even implement the system they paid a million dollar prize for)...

But, this code may help a new service to scale-up to instagram like growth without reinventing the wheel all over again.
posted by stratastar at 11:01 AM on September 6, 2012


« Older you're going to reap just what you sow   |   Waiting For A Cure Newer »


This thread has been archived and is closed to new comments