Huge, Creaky Applications
August 8, 2012 10:11 PM Subscribe

The underlying problem here is that most software is not very good. James Kwak writes in The Atlantic about the economic risks of bad software. Angry mob comments.
posted by xenophile (82 comments total) 21 users marked this as a favorite

Actually it's worse than that. We have user interfaces that are actively user hostile, we have legacy systems where business logic and data are so intertwined that maintenance is more a matter of prayer than programming, we have poorly documented and nonstandard links between key systems, and we have a cultural disconnect between IT and the wider world.

Aside from that, everything's good.
posted by fallingbadgers at 10:29 PM on August 8, 2012 [20 favorites]

Is this the right thread to remind everyone how much of the software underlying much of everything is written in COBOL? Because I think it is.
posted by hippybear at 10:35 PM on August 8, 2012 [5 favorites]

I can list the occasional unlikely-but-catastrophic event related to *any* field. in the same time frame as a couple of stock trading firms had software glitches, how many airplanes crashed? How many nuclear reactors irradiated Japan? How many police shot unarmed civilians? How many people died due to poor building codes in earthquake-prone areas? How many women died in childbirth because we're a species that was engineered so poorly as for this to be commonplace?

I don't know what this article is trying to claim. Nothing is perfect. Not software, but not anything else, either. At least software improves quickly. No, seriously, how often does you computer crash compared to 10 years ago. You can type "1 usd in euros" in google and *it will tell you the answer*. I have a little device that fits in my pocket that can talk to anyone I. The world, take dictation, or give me turn-by-turn directions from wherever it is to anywhere in the world.

But software sucks.
posted by tylerkaraszewski at 10:38 PM on August 8, 2012 [9 favorites]

This is a pretty good article about why I can't sleep at night.
posted by outlaw of averages at 10:40 PM on August 8, 2012 [2 favorites]

I work in a multinational manufacturing company, and I think our proprietary accounting/forecasting/inventory/costing software was written by drunken monkeys huffing on crack pipes.

No, sorry - that's not fair to monkeys.
posted by Mary Ellen Carter at 10:40 PM on August 8, 2012 [4 favorites]

When I was in college radio and taking a couple computer classes, I made a geeky/obscure joke PSA for "The Committee to Find a Cure for COBOL"... that was 1977, just over 35 years ago. I see they still haven't succeeded yet.
posted by oneswellfoop at 10:40 PM on August 8, 2012 [4 favorites]

The underlying problem here is that most software is not very good.

Yes, isn't it wonderful?

Signed,

A software tester.
posted by MartinWisse at 10:48 PM on August 8, 2012 [5 favorites]

Why can't the IT industry deliver large, faultless projects quickly as in other industries?
posted by blue_beetle at 10:49 PM on August 8, 2012 [6 favorites]

There is industrial/scientific/military software that works in levels of complexity and security that we cannot fathom. It's no small potatoes to write good JPL-standard C code that will land Curiosity on Mars. Of course, when these fail, they fail spectacularly.
posted by Apocryphon at 10:54 PM on August 8, 2012 [3 favorites]

Snark out of the way and having now actually read the article, it's of course yet another awful Atlantic article where you take a specific problem and generalise it to whatever point you want to prove, not at all intended to be an asscovering exercise to absolve the people responsible for the problem you started with from their responsibility.

That is, the problems with the trading software Kwak mentions are not because software's bad, okay, but because the failure modes of software designed to handle millions of transactions per minute are always going to be awful. It works as designed, it's just that the geniuses who designed it and especially the geniuses who wanted to use this sort of software in the first place never thought through the consequences if they got it wrong.

To generalise from this specific case to the point that therefore all software is bad or error prone (while not entirely incorrect) is missing the point. With most software failure modes are much less spectacular: if Word crashes and eats your thesis it's bad news for you, but it can't crash the world economy. Even with banking software, most banking software glitches will be awkward, perhaps painful, but not catastrophic (unless you're a customer of a certain British bank I won't mention.

What Kwak is doing is a neat bit of blame shifting, from the specific people and companies that worked on the trading software that went bad, to the industry as a whole for not being good enough, knowning full well that this will solve nothing.
posted by MartinWisse at 11:00 PM on August 8, 2012 [6 favorites]

I have a pet theory that all the good programmers/ui/software people went into the game industry, attracted by the culture and game programming having generally more challenging problems.

Even the shittiest game ever, ET for the Atari, is held as a programming feat, since it was done by one guy in the span of a few weeks.
posted by hellojed at 11:00 PM on August 8, 2012 [1 favorite]

Why can't the IT industry deliver large, faultless projects quickly as in other industries?

Ironically, what that Reddit article misses is that the A380 itself is a hugely complex software project that worked quite well, if not quite faultless.
posted by MartinWisse at 11:02 PM on August 8, 2012 [2 favorites]

I am just frustrated that one of my projects has to handle "documents", These documents are now in the range of 250+ mb of uncompressed XML, when we started this was unimaginably large for what we were setting out to do, we would have struggled to put together test documents half that size. Oh, by the way, we are processing 4-5 documents of this size a minute. We generate exports that are too large for excel to open. So they got a nice ranch house or bungalow and now they want to put a grand piano in every room and are annoyed that it is too cramped to throw parties every night.

So, what caused this problem? The success of the software itself. Working with smaller documents became so easy that generating larger documents was finally within reach. When it was dozens of people in India working with XmlSpy none of this was possible. Analysing a small document took days so everyone kept everything small. The project itself paved the way for its own undoing ( that may be a bit dramatic, I will just lobby for more and more servers).
posted by Ad hominem at 11:17 PM on August 8, 2012 [7 favorites]

Even the shittiest game ever, ET for the Atari, is held as a programming feat...

The Atari 2600 had 8KB of extended memory in two 4KB banks, and 128 bytes of main system memory. And yet ET manages to cram an entire lifetime's worth of cursing, confusion, and frustration into those limited specs by eliciting them from the user. That's pretty amazing.
posted by 1adam12 at 11:19 PM on August 8, 2012 [8 favorites]

Well I consider E.T to be one of the most crushing disappointments of my life, so that was quite a programming feat.
posted by Ad hominem at 11:21 PM on August 8, 2012 [3 favorites]

I have a pet theory that all the good programmers/ui/software people went into the game industry, attracted by the culture and game programming having generally more challenging problems.

Even if this is true, I can say very confidently that a lot of other programmers went into the games industry as well.
posted by aubilenon at 11:25 PM on August 8, 2012 [9 favorites]

I have a pet theory that all the good programmers/ui/software people went into the game industry, attracted by the culture and game programming having generally more challenging problems.

That would be surprising. The game industry has a notoriously unpleasant culture. I'm sure they get good programmers (certainly there are some classic examples of excellent programmers in the games industry, but I don't know if they arrive now), but I'd be surprised if they managed to attract most of the good programmers.

I think it's simpler than that. From my end the problem simply looks like there are far more programming jobs than good programmers, and the people who make the decisions in most cases rarely have visibility of the consequences of hiring bad ones and not helping them improve.

(It's also the case that good programmers can and do write bad software. I've yet to understand how to fix that)
posted by DRMacIver at 11:25 PM on August 8, 2012 [2 favorites]

On further thought I'm not sure that's simpler. But I do think it's true. :-)
posted by DRMacIver at 11:46 PM on August 8, 2012 [1 favorite]

Not sure about the article, but that chart is a thing of beauty.
posted by vidur at 11:50 PM on August 8, 2012 [3 favorites]

That would be surprising. The game industry has a notoriously unpleasant culture.

Not surprising at all. It has that unpleasant culture precisely because so many people want to work on games, that not only will they accept longer hours, lower money, etc, but even then there are still so many people wanting in that they distort conditions towards ugly.

There is no shortage of programmers who, having made millions at places like microsoft, quit so that they can work on games for a pale fraction of that.

Some of them go on to even bigger success, but really, no-one gets into games for the money - that would be like enrolling in computer science for the hot women.
posted by anonymisc at 11:52 PM on August 8, 2012 [3 favorites]

I just had two weeks of wrestling with Android so you can count me on board with the "most software sucks" camp...

So buying a brand new Nexus was supposed to usher me into the future of Android, except it didn't (my Nexus is stuck at ICS 4.04 but that's another story).

The first thing I did when I booted up the phone was check out the Gallery feature, and it immediately started syncing my photos from my Google+ account. 1 GB of syncing later, it declared it had successfully cached the entirety of my online photos in the phone, enabling me to access them offline.

Except it didn't. Whenever I tried to access them on 3G, it would start re-downloading all those photos again. The caching system apparently doesn't work: it would randomly forget and lose entire albums and redownload them.

Ok fine. Maybe the caching feature is broken: I confirmed this with other users online, who reporting caching mp3s did not work either. That's ok, I'll disable it. Now comes the 2nd problem - the Gallery app refused to let go of the 1GB of cached broken photos it had downloaded. Nothing worked: clearing user data, clearing cache, it was as if I had lost that 1GB of storage permanently.

Ok fine. I checked with other users online and apparently this is just how Android works. I'll just uninstall the app entirely, that should solve it, right? And here comes the 3rd problem - Android refuses to allow you to uninstall anything related to the Google+ app.

So within a day of owning a brand new phone, with the latest and greatest version of Android, I've already run into 3 incomprehensible design / programming decisions. Why ship software with features that don't work? Why prevent users from clearing app cache? Why prevent users from uninstalling broken software?

My own pet theory is that all the crappy programmers work in non-critical areas like smartphones and games, while all the good ones design the software that keeps airplanes in the sky.
posted by xdvesper at 11:52 PM on August 8, 2012 [3 favorites]

err, I should have said "for the abundance of women", what I wrote could be misinterpreted. Sorry.
posted by anonymisc at 11:55 PM on August 8, 2012 [2 favorites]

3 incomprehensible design / programming decisions

These were almost certainly business decisions and NOT design/programming decisions.
posted by wemayfreeze at 12:17 AM on August 9, 2012 [5 favorites]

xdvesper: I've seen plenty of good programmers outside the critical areas. I've seen far more crappy ones, mind you, but I consider it very unlikely that the reason why avionics software works better than your smart phone is (only) because their programmers are that much better.

When you need safety critical software you design processes which prevent you from writing crashy software that will kill people even if your programmers are bad. This is expensive.

When on the other hand the worst consequence of crashy or buggy software is that your users whine about you on the internet and still buy your software anyway, somehow you don't spend quite as much money on processes to ensure your software is bug free and instead rush it out the door anyway despite your programmers telling you it will take them another 3 months to iron out the problems.
posted by DRMacIver at 12:23 AM on August 9, 2012

My feeling is that programming is one of the most challenging intellectual challenges there is, that's why there's so much crappy software--by it's nature it is a hard thing to write software.

We haven't been able to come up with a a more powerful description of how to do things than a computer program (Turing machine). I mean that in a universal, formal sense. From the preface of a famous book about programming:

...the emergence of what might best be called procedural epistemology the study of the structure of knowledge from an imperative point of view, as opposed to the more declarative point of view taken by classical mathematical subjects. Mathematics provides a framework for dealing precisely with notions of "what is." Computation provides a framework for dealing precisely with notions of "how to."

That is way beyond building bridges.

In addition, computer programs must be utterly precise. Nothing can be hand-waved away.

The combination of power and the complete precision required is what makes me think that writing software is one of the most difficult things people will attempt, ever. And whenever we come up with good ways of writing reliable software of complexity N, the next thing we always do is write crappy broken software of complexity N+1. And there is no limit to the complexity we can try for.
posted by jjwiseman at 12:25 AM on August 9, 2012 [3 favorites]

Related post about Knight Capital.
posted by homunculus at 12:28 AM on August 9, 2012

Most software isn't very good. In my own (scientific) field, I've run across a number of widely used programs that on occasion produce plausible but incorrect results. And largely, no one cares. So long as it mostly works, people will keep going because the software is just a way to get a job done and chasing up errors and bugs doesn't do anything for you.
posted by outlier at 12:31 AM on August 9, 2012 [1 favorite]

I work in a medium-sized IT department. Our CIO is rewarded and encouraged for providing services to as many people in our organization as possible. Of course, this leaves only the bare minimum of time for testing and hardening our systems. I used to be frustrated and confused at his indifference to stabilizing our systems; now I'm just frustrated.

The reason software sucks, then, is because institutional cultures do not find the costs of stable software to be bearable. If our CEO felt that our internal applications were too flaky, she might then encourage our CIO to make them better, and insulate him from the incessant demands of various departments. She does not do this.

To a software developer, working with in all this poorly-tested, constantly expanded code is discomforting. But I can see it from both sides. The costs of properly testing software are real and measurable. When you weigh that against the risks of breakdown, suddenly cheap crappy software doesn't look so bad. Should your developers spend 3 extra months writing unit tests, or should your IT folks just do data recovery and restart things when they crash?

Granted, Knight Capital appears to have misjudged their risks and they are suffering the consequences.
posted by yath at 12:31 AM on August 9, 2012 [4 favorites]

I can tell you how stuff like an unclearable cache happens.

1) Software goes to testing and someone says "Man this is slow. How come it is slow, it should be fast, like super fast."
2) PM says to lead, "we have had some reports that viewing photos is slow, how do we make it faster" lead says, "we are downloading everything from the Internet, over 3G,we should cache it so it is local"
3) PM says "awesome, I am going to assign this to X offshore resource so he may ping you"
4) Lead gets IM from some dude in India at 4 am " Hello dear , I am doing I am doing implementation of story 5520, what is best way to save to android file system"
5) After first thinking, why is this person calling me dear, the lead replies "use the XYZ API"
6) New build goes into testing 10 am eastern and it is fast as hell, Lead realizes that offshore devs are all calling him dear because they think his name is female.
7) Software ships, user wants to clear cache and reports that he can't. They realize simultaneously that they never even thought about clearing the cache.
posted by Ad hominem at 12:31 AM on August 9, 2012 [26 favorites]

"In that thread there were an awful lot of 'computer science PHDs' who focus on 'formal methods' and 'zero defect practices' like that shit is even desireable for the 99% of software and people with "17 years experience as a programmer" who claim they could write perfectly bug free programs given enough resources. I'm pretty sure they couldn't for any problem with a reasonable complexity."

We keep having this argument. And it's not that anyone is saying that all software should be built to that standard, it's that almost none of the software that truly should be built to that standard is built to that standard because building things to that standard, while possible, is pretty much unpossible in the programming world as it exists today. And the reason why this is the case is that 90% of the people working in the field are absolutely wedded to the romantic notion that producing software is a craft, or an art, and not engineering with well-understood engineering tools and practices and tolerances.

Software engineering should be engineering, but it's a long, long, long way from being engineering. It's partly because of the culture, as I just asserted, but of course it's also because managers and businesses don't want to pay for engineering methods and discipline.

Talk to someone with a professional background in ME or EE in product design and manufacturing who transitioned to software development and tell me that they don't laugh at the shoddy tools, techniques, and practices that characterize software development. As I mentioned in a previous thread, simple revision control has gone from rare to the norm only in the last twenty years, long after it was available and long after code became large and complex enough to regularly require it. It took this long to become standard practice because programmers didn't like it and managers didn't want to devote resources to it. This is the story over and over and over.

Only the most vital software needs to be developed via formal methods or otherwise to some very highly characterized standard. But the fact that the vital software mostly isn't developed this way, and it does lose 400 million dollars in a day or shuts down the newly opened Denver airport for a week tells you that the non-vital software is developed to even lesser standards. And because it's not vital, we tolerate that our PCs crash and our cable TV DVRs crash and our smartphones crash because we've come to accept that software is inherently unreliable. But it's not inherently unreliable, we're just not willing to pay for reliable software and programming culture doesn't want to develop reliable software because it's less fun to do it that way.
posted by Ivan Fyodorovich at 12:34 AM on August 9, 2012 [19 favorites]

Inviting Disaster: Lessons From the Edge of Technology
posted by jcruelty at 12:38 AM on August 9, 2012

the Gallery app refused to let go of the 1GB of cached broken photos it had downloaded. Nothing worked: clearing user data, clearing cache, it was as if I had lost that 1GB of storage permanently.

The gallery app is just a viewer of stuff on the storage; it doesn't store the files directly. The google+ app is the one that pulls your photos from google+ (via the account sync page) - the gallery cache is just the local thumbnails it creates for the gallery view. Note picasa sync is separate from google photos - they haven't quite managed to fully merge the two services yet; my google+ photos are all in picasa and still governed by picasa web albums sync.

Clearing the data+cache on the google+ app rather than the gallery did the trick for me when I just tested it on my galaxy note running ICS, after turning sync off.

Uninstalling the app won't help, that just removes the apk system file - it leaves the data behind in case you want it later. You can't uninstall the gallery app as its part of the ROM firmware, i.e. built in rather than stored as user software. You can remove it manually, but need root to do so.

But yes, going by this bug report it's pretty damn screwed up right now on ICS.
posted by ArkhanJG at 12:43 AM on August 9, 2012

I agree that the 1% of software that is responsible for lives, or 400 million dollars should be as stable as we can possibly make it.

That being said, I think the resources dedicated to testing a sophisticated application could easily exceed the resources dedicated to writing the software in the first place.

In ME you have a spring, you have a cog, you source those springs and cogs and buy them from a supplier. The supplier has specs in the parts, you know the best way to fasten them, you assemble them in a way that does not exceed the specs.

In programming, you have no cogs or springs, you may have an API that does some of what you want. There are no specs, the API could be doing just about anything internally. Everything else you just have to roll your own. There are abstractions 10 layers deep, one day you learn that your string handling library needs to allocate memory in contiguous chunks so it fails at random under high load. The next day the client changes his mind and you rewrite half your code.we do stuff every day that has never been done before, by definition. If the app already existed we would use that instead of writing one. Maybe in 10 years stuff like Agile won't be around and we will dedicate 10'testers for every developer and we will have cogs and springs, with known specs, we can work with. So in short, I probably agree in principal.
posted by Ad hominem at 12:54 AM on August 9, 2012 [1 favorite]

Software is not like engineering or architecture. Even with all the licensing and inspections buildings collapse and kill people.

Software can be built using engineering principles though. Margins of safety, extensive modelling, exhaustive testing of the design, and very precise design specifications drawn up by highly trained and regulated people. That all costs time and money though, and requires management not jerking around like water in a hot pan to the tune of constantly changing specifications. When mistakes are made - ala Tacoma Narrows - studying those mistakes so they're not repeated in future is taken very seriously.

As you say, most software has no need to be built to that standard. Building engineers, like most proper engineers, have to be properly qualified and are licenced by their relevant trade body and held liable if the screw up, because when they screw up, the consequences are dire.
The guys who wrote Angry Birds, not so much.

The problem is, a lot of software that should be built to that standard, isn't. Just because it's not directly related to human life-safety (ala medical devices, airplane flight systems, engine management control etc etc) doesn't mean it should be the slapdash untested, rushed into production, written by minimum wage outsourcers in a short a time frame as possible then modified left right and centre on the fly before being dumped into some poor sysadmins lap that most software is.

Take financial software, for example. I know a guy who works with FIX protocol apps, and frankly, its scary how slapdash the whole business is. Applications responsible for millions of dollars of transactions do work patched by this guy to work round a problem, with no oversight, no hard specs, management that just want it working as soon as possible...

It makes the trained engineer in me twitch every time I think about it.

The latest natwest banking disaster in the UK for example. Six days of the bank unable to process payments. People not getting paid so not being to even buy anything with cash. Shopping orders not going though. Holidays getting cancelled because payment couldn't be taken.

It came down to a software upgrade being applied to their creaky production batch processing system, without being properly tested or having a backout plan. When they did finally backout the patch, they destroyed the transaction queue. When they restored the transaction queue from backup, they ended up with duplicates.

Natwest had fired half the experienced guys responsible for CA7 batch scheduling, and replaced some of them with an outsourced team from india, and apparently there'd been reports of problems for two years prior to the screwup which management ignored. Natwest deny that had anything to do with it.
posted by ArkhanJG at 1:06 AM on August 9, 2012 [12 favorites]

Well yeah, there is certainly just being plain stupid. I have to file a testing and rollback plan for every production release, there are plenty of developers who would say that is an unconscionable amount of red tape. My favorite is DR planning and drills where I need to anticipate having been killed in a terrorist attack. But really it is a spectrum of how much you can rely on you customers as unpaid testing, for Angry Birds or MetaFilter, maybe a lot. For banking, not so much.
posted by Ad hominem at 1:23 AM on August 9, 2012 [1 favorite]

Let me tell you one of the things that happens in the real world.

Have you ever had a slide or figure you've made get used by someone else in a way that totally misrepresents/misunderstands the original purpose of the figure? This also happens with software.

I work for a Fortune 500 company. I needed to solve a particular problem, and, at the time, internally-developed software was deemed bad. As this was a new problem, and a small one (at the time), there was no money in hiring someone to develop the software, and no internal resources to do it (because we should buy only off the shelf).

So, I did what any reasonably capable and smart person should do, I wrote the software myself and solved the problem. It wasn't good code, in fact, it was crap, but it solved the problem, and I did what I needed to with the limited time I had available to develop it. It took me about two weeks to write in all.

Fast forward nearly a decade.

Now, that same software is still used to solve the original problem, but the use has expanded far, far beyond the original conceived use. People are using it to solve unanticipated problems, and complain that "it's not written properly", and that it's "bad software". However, there's still no financial gain in paying anyone to rewrite it properly, so it will likely get used for another decade or more.

Should that be the case? No. Are all the decisions rational, business-focused, and smart at the time they are made? Yes. This is a microcosm of what happens day in and day out in every company in the world. Scarce few companies are willing to make what seem like 'bad business decisions' at the time for a long term strategic gain, because the low-level management will get punished for it, and the upper levels are responsible for 'strategy' and are blissfully unaware of 'small' problems like this.

It's a mistake to convolve bad commercial software with bad internal software, because it's often the case that it's the latter that makes the world go round - as was the case with Knight, and it's far, far scarier when it's bad.
posted by grajohnt at 1:27 AM on August 9, 2012 [13 favorites]

That being said, I think the resources dedicated to testing a sophisticated application could easily exceed the resources dedicated to writing the software in the first place.

Quite possibly; even likely. This is not at all unusual in engineering.

I recently wrote an email-driven job-ticket system in a couple of weeks for my 4 man IT team, because we had no funds to buy one, and all the open-source ones I've tried over the years suck at email-based tickets. Since I'm in the process of re-learning to code (I was taught pascal and C along with various VLSI languages as part of my engineering degree) I did it in javascript, because that's what I'm learning.

It works well enough so far, though I've had to fix some bugs in it and I've still got more features to add. I've been doing my best to work with test-driven development as I'm doing it, and writing the tests - and the manual testing where I've not managed to figure out how to write a unit test that accurately covers the full data-flow from front-end to API to server to database - has easily taken me more time than actually writing the code itself, quite possibly twice as much. And this is a podunk itty-bitty bit of software I whipped up to fill a need for a tiny team that will likely never be used outside our office, though I have open-sourced it on github (given how many MIT libs I've used it'd be rude not to).

What worries me is far too much software is written just like that, but not even bothering with unit testing, proper documentation (which my code also lacks), source control, code review etc. Then you add in management pressure to get stuff done as fast as possible for as little money as possible, specs that change constantly... but ends up being used for actually important stuff.

The banking culture is characterised exactly by the type of bad management that leads to avoidable software problems - fast, cheap, outsource the risks if things go bad, so it's not at all surprising that we get random events, that so far, haven't caused the complete collapse of the finance system. Yet. With all the automated trading tools these days, running with virtually full autonomy... I do think it's a matter of when, not if, we're going to see a truly catastrophic banking failure. And the way it's all interlinked, that even sovereign governments are struggling from lack of confidence in their ability to service the debt, I think it's going to make the current 2007 crash look like a cake walk.
posted by ArkhanJG at 1:50 AM on August 9, 2012

Aaaand - interesting report published in the UK today by the trade association for the UK IT sector.

"This infrastructure is the foundation upon which the entire financial system is built and it has been neglected for far too long," he said. "For the past four years, government and regulators have been trying to treat the wounds exposed by the financial crisis with stick plasters. The regulators, and in particular the Financial Policy Committee and the forthcoming Prudential Regulatory Authority, must take the lead on this now - it's not going to sort itself out."

'Banks are willing to spend money on cutting-edge technology that facilitates high frequency trading or reduces the time it takes to process a transaction in the capital markets - where every cut millisecond means more profit - but not on modernising the infrastructure that allows them to deliver better customer services'

Wilson said that without prompt action, the effectiveness of forthcoming reforms both to banking and financial services regulation would be "severely limited". He added that global efforts to increase transparency by standardising data across the financial system would be "undermined" by poor infrastructure.

Article; report (pdf).
posted by ArkhanJG at 1:57 AM on August 9, 2012 [1 favorite]

Well take your standard analogy, a bridge. Very stable. Now shift the assumptions, oh say 1 percent, like the ground it is attached to doesn't move. Now how good was that bridge engineering during a big earthquake?

The software is often very close to right, but the assumptions, requirements, often change by 50% or more.
posted by sammyo at 4:09 AM on August 9, 2012 [2 favorites]

To follow on from some of the previous comments by analogy, Chile who assumes earthquakes (strict building codes) had a major quake and there were cracks but mostly everything stood up fine.

Haiti assumed no quakes ever and had a medium large quake.

Now Chile never considered Tsunamis and did loose a small town.
posted by sammyo at 4:17 AM on August 9, 2012 [1 favorite]

There are solutions to these problems, but they are neither easy nor cheap. You need to start with very good, very motivated developers. You need to have development processes that are oriented toward quality, not some arbitrary measure of output. You need to have a culture where people can review each other's work often and honestly. You need to have comprehensive testing processes -- with a large dose of automation -- to make sure that the thousands of pieces of code that make up a complex application are all working properly, all the time, on all the hardware you need to support.

It disturbs me that there's no mention of user experience designers or information architects. There are people whose entire job is devoted to making sure things make logical sense, are easy to do, and do everything required. The only people who have ever heard of this job are the people who do this job. Even at the places they work.

Although I guess I'm now inviting all the developers in this thread to say I'm part of the problem.
posted by bleep at 4:18 AM on August 9, 2012 [4 favorites]

Oh hippibear, do remember that COBOL is at least "self documenting" ;-} ;-}
posted by sammyo at 4:19 AM on August 9, 2012

For 99% of sotware quality doesn't matter. The 1% that does is stuff that will cost serious money or lives. In the physical world, even an outhouse can collapse and kill someone so all sorts of regulations, inspections and licensing make sense. Who gives a shit if Angry Birds crashes. Sofware is not like engineering or architecture.

Yes, I agree. But that doesn't mean building software can't be a far more disciplined profession. And it does matter if Angry Birds crashes. All other things considered equal, if one company can build software more reliably than another, then users will use that software. This has direct economic impact.

Software can be built using engineering principles though. Margins of safety, extensive modelling, exhaustive testing of the design, and very precise design specifications drawn up by highly trained and regulated people. That all costs time and money though, and requires management not jerking around like water in a hot pan to the tune of constantly changing specifications. When mistakes are made - ala Tacoma Narrows - studying those mistakes so they're not repeated in future is taken very seriously.

It's good to reflect on mistakes made, and try to ensure they aren't made again. But I couldn't disagree more with the rest of what you're saying. In fact, one major mistake that some software developers discovered years ago (but which the vast majority of us--and more important our managers--seem to keep forgetting) is that writing big, unchanging specifications is fundamentally a fool's errand, for most applications. Perhaps having regulation for medical and financial applications would be a good thing. But for web and other application development it would be a hindrance. This doesn't mean that we should have high quality development though! So there has to be a different approach for most software.

We keep having this argument. And it's not that anyone is saying that all software should be built to that standard, it's that almost none of the software that truly should be built to that standard is built to that standard because building things to that standard, while possible, is pretty much unpossible in the programming world as it exists today. And the reason why this is the case is that 90% of the people working in the field are absolutely wedded to the romantic notion that producing software is a craft, or an art, and not engineering with well-understood engineering tools and practices and tolerances.

No, the reason is (bad) economics. The perception remains that it is cheaper to push something out the door that appears to work, and deal with problems later on, than to hire truly disciplined software development professionals and adhere to practices which produce good software. And it is obvious that up-front costs are easier to understand than possible costs down the line for businesses.

It doesn't help that most managers have no idea what the difference between a good developer and a bad one is. But this is not because 90% of developers think they are "artists," it's because businesses don't have any incentive to change this state of affairs.

Moreover, software development is not, cannot be and should not be considered engineering in the same way that building bridges, buildings, or circuits is. Fundamentally, software is malleable and pervasive in a way that no physical engineering discipline has adequately prepared us for. And while I agree that viewing software "engineering" as an art form is appropriate really only for...artists (which I may even disagree with), there certainly should be a good deal of craftsmanship involved in building software.

I completely agree that there needs to be a disciplined approach to software development. I personally believe that agile development, in particular practices introduced by the so-called "extreme programming" proponents, are the way forward. They embrace the basic fact about software, which is that expecting rigid specifications to be written and not change is fundamentally foolish and furthermore, doesn't reflect the true power of software. And these practices insist developers take a high level of responsibility for their software, while at the same time insulating against failure as best as possible.

Ironically, most software written not using agile practices ends up becoming far more rigid, far closer to a badly designed building, than that which uses practices such as test-driven-development, pair programming, short sprints and continuous integration.

The problem is not that software engineering doesn't adhere to other engineering disciplines' practices; the problem is that we tried that, it has failed, and we haven't figured out the right way to do things in a way that has become standard yet. Moreover businesses don't insist developers adhere to any consistent discipline, for economic reasons. This discipline is still very young.

It disturbs me that there's no mention of user experience designers or information architects. There are people whose entire job is devoted to making sure things make logical sense, are easy to do, and do everything required. The only people who have ever heard of this job are the people who do this job. Even at the places they work.

Although I guess I'm now inviting all the developers in this thread to say I'm part of the problem.

For the record, I agree with you completely. Designers and UX folks need to be tightly integrated into the process, from the start. And developers need to grow their discipline to accommodate design, user experience and information architecture from day one. This is one area where we are especially deficient. In fact, I believe that developers should consider themselves, fundamentally, user experience designers. Software is for users.
posted by dubitable at 4:43 AM on August 9, 2012 [4 favorites]

Whoops, that first quote was meant to link here. SHOULD HAVE TESTED IT
posted by dubitable at 4:48 AM on August 9, 2012 [5 favorites]

I have a pet theory that all the good programmers/ui/software people went into the game industry, attracted by the culture and game programming having generally more challenging problems.

This is true iff you have decided that 'good' means 'able to extract the most from available hardware resources.' I have crazy respect for the guys who wired up the first generation of 3D perspective in gaming, or who figured out how to beat the raster on the old Atari consoles. However, having done a fair bit of ROM spelunking as a younger lad, I can attest to you that if any one of those guys worked on my team at the office, he would have been found with a note on his forehead saying 'design patterns, motherfucker.' Held on by a roofer's nail.
posted by Mayor West at 5:25 AM on August 9, 2012

"Neither proofs nor tests can, in practice, provide complete assurance that your programs will not fail." -- John Goodenough [eponysterical] & Susan Gerhart, Toward a Theory of Test Data Selection, 1975.

If you take a good course in software testing, you will be given several cites to well-done studies that support the statement above. My belief is that if you claim zero defects, you're not looking hard enough, or you don't know how to look. People want to blame something: the tools, the processes, the developers, the testers; they don't want to look at the fact that software is inherently error-prone.

Of course, the hurry-up mentality together with zero or little accountability for software failures doesn't help. If you're in IT, you probably know what I'm talking about. If you can sue a doctor or a lawyer for malpractice, you should be able to mount a claim against a software company for an easily-discoverable bug that damaged you. The "easily-discoverable" part would be a matter of fact, but that's why God made expert witnesses.

On a related note, there's a trend away from hiring trained, experienced testers these days; now you see ads for "developers in test" or a tester who is also expected to be an automator (a/k/a a full-fledged developer). Not saying that a developer can't also be a good tester, but testing is a specific mind- and skillset that requires some training as well as some qualities that can't be taught. it's amazing how, more and more, companies want to give short shrift to testing skills in favor of hiring someone who can program, then test.
posted by Currer Belfry at 5:30 AM on August 9, 2012 [2 favorites]

Working on a large project is often "You are in a maze of twisty little passages, all alike".

Poorly defined and rapidly shifting requirements, complete isolation of developers from users and very low or no resources committed to testing and especially usability are just some of the problems.

When I worked for a pharmaceutical company, we spent massive resources on testing and validating manufacturing process control code. Even then, sometimes things crashed.

I've played a lot of Angry Birds. It has never crashed for me.
posted by double block and bleed at 5:41 AM on August 9, 2012 [2 favorites]

The perception remains that it is cheaper to push something out the door that appears to work, and deal with problems later on, than to hire truly disciplined software development professionals and adhere to practices which produce good software.

For software that has a limited lifespan, it usually is cheaper to do this. Cheap, shitty Angry Birds would be a lot more profitable than super-stable, never-crashes Angry Birds.

The problem is that customers gravitate toward software that works, so that a crashy Angry Birds, which wouldn't do any actual damage, would still badly impact sales of Angry Birds 2, and if 2 was also crappy, then no matter how much people loved the first, there probably wouldn't be a 3.

But it's absolutely cheaper to shovel shit out the door today, as long as you don't have to maintain the code. If you can abandon it, then quality is purely an expense.
posted by Malor at 5:46 AM on August 9, 2012 [1 favorite]

Also, one difference between programming and engineering is that if an ME specifies a grade 8 bolt, he gets a grade 8 bolt that conforms to a strict standard. When I an API, there is no standard.

The engineer knows the capabilities and weaknesses of his bolt. I have no idea what the hell is going on inside that API.
posted by double block and bleed at 5:48 AM on August 9, 2012 [2 favorites]

I'm reminded of Gödel, Escher, Bach, which argues (among other things) that perfection is unattainable in computational environments. The harder one tries to guard against a possible mistake, the more clearly defined the fatal flaw becomes.

With that said, there are better and worse ways to go about software development. Ivan Fyodorovich's comment above expressed it best: software developers could learn a lot from well-established engineering practices in other fields.

Or, of course, you could write your software at 3am while staying awake with sugar, caffeine, and loud music. I'm sure that's a great habit to teach young developers.
posted by mark7570 at 6:03 AM on August 9, 2012

The author's problem is that he's assuming that high-risk aggressive automatic stock trading programs are "critical systems". The people who write them don't think so because they see it merely as a fun problem to tackle. The people who run them don't think so because who gives a shit if the software pukes a million dollars in the market, there's a billion more where that came from.

If there is a problem here, it's just the same problem as always: traders treat the market as a game, and low-capital investors lose money because a Master of the Universe fucks up just one of the many million dollar trades he or she has to complete that day, and doesn't care.
posted by sixohsix at 6:04 AM on August 9, 2012

Another thing I seem to see all the time - you've got the genius level guys writing the engine, whatever that means in the respective context, very few errors, heavy on smart algorithms, true masterpieces. And then they assign the part that interacts with the user to some mediocre programmer they shouldn't have hired in the first place, but we have to give him something to do, right?
posted by dhoe at 6:06 AM on August 9, 2012

I love the way this guy contrasts the process of building software against the process of building the Airbus A380, apparently completely ignoring the fact that not only is the A380 itself a software project to an unprecedented extent but that it was engine build quality, not software, that caused its most notorious failure.
posted by flabdablet at 6:35 AM on August 9, 2012

So, I have this very strong belief that the world is held together by duct tape. All our institutions that we believe in so much (or don't, as the case is in some people), all the various procedures and everything we do, is a hack to hold together and push out things, but nothing is ever really secure and safe, it's just sorta taped all together, and even the most tightly maintained organization has its own cracks and foibles. Anything that has to do with humans is going to have this issue.

It's part of the laws of the universe. Entropy, the breakdown of information and systems. And our attempts to stop that only breed more ways for the systems to leak and break.
posted by symbioid at 6:41 AM on August 9, 2012 [2 favorites]

I'm reminded of Gödel, Escher, Bach, which argues (among other things) that perfection is unattainable in computational environments. The harder one tries to guard against a possible mistake, the more clearly defined the fatal flaw becomes.

I think that's only a cultural construct. Computers couldn't be simpler: on or off. It's people where perfection is unattainable. And the people who work in information systems seem inordinately motivated by things other than getting the machines to do their job as well as possible.

So, I have this very strong belief that the world is held together by duct tape. All our institutions that we believe in so much (or don't, as the case is in some people), all the various procedures and everything we do, is a hack to hold together and push out things, but nothing is ever really secure and safe, it's just sorta taped all together, and even the most tightly maintained organization has its own cracks and foibles. Anything that has to do with humans is going to have this issue.

This is true. You aim for the best and rarely hit the mark dead on. But in aiming for perfection, you manage to get pretty close to good enough. But when you are only aiming for good enough, you are just as likely to hit perfect as you are to miss the target completely.
posted by gjc at 6:52 AM on August 9, 2012 [1 favorite]

So, I have this very strong belief that the world is held together by duct tape.

The world is not held together by duct tape. It is held together by you.
posted by notyou at 6:55 AM on August 9, 2012 [4 favorites]

Perhaps we see a trend here?

- Our software isn't very good.
- Our governments are not very good.
- Our methods of wealth distributions are not very good.
- Our patterns of urbanization are not very good.

Maybe we aren't very good at doing big things in concert with each-other?

Dude. Software is /hard/. I don't mean the coding part. As many undergrads we hire point out, everything is super-easy and everyone else sucks at it and all we have to do is do the /correct/ thing and blah blah blah blah.

What I mean is that, for anything beyond a boutique app (i.e., anything that is actually used on a scale where it affects all of us in some manner) maintaining a large software product is a very, very hard thing to do. There is an insane amount of management around the work of maintaining, deploying, supporting, updating, training any large software offering, and very little of that work is localized on the initial development.

It almost doesn't matter if the "original" code was any good or not -- it is the subsequent changes and changes in direction where the critical stuff happens, and none of that can be done by a midnight hacker in his or her bedroom.

A big software project is more akin to building a pyramid, except that you started out building a canal and the direction changed such that your audience decided they wanted a pyramid instead, but one with canal-like properties. And then they want it on wheels.

And the dirty little secret? Know one really cares that the result is a steaming mass -- most bizniz folks know that they are getting a super-awesome deal, even with the maintenance costs. Because we already have segments that stress correctness and provability and rules about regressions. And those segments /cost/ you. A lot.

Take your maintenance costs and double them. Then add 10%. Then throw that estimate away and start writing blank cheques. And then realize you will never be able to update a driver, or an OS, or get updates more regular than every 2-5 years.

Because this is what the costs are right now.

We might be able to make small changes to bring correctness and usability back to the general enterprise space (and we /are/ actually doing that right now -- this is why everyone is always so obsessed about the principles and tools around coding, and not really all that interested in the code itself. We've pretty much solved most of the problems around the code. It is the management and maintenance stuff where the cutting edge lives.

In short, it ain't the coders, or some companies we like to shit on all the time. It is all of us. If we want everything, we are going to have to pay.
posted by clvrmnky at 6:59 AM on August 9, 2012 [6 favorites]

Ad hominem: "Well I consider E.T to be one of the most crushing disappointments of my life, so that was quite a programming feat."

I see you've never watched The Phantom Menace.
posted by Chrysostom at 7:11 AM on August 9, 2012 [1 favorite]

Anecdote: Buddy of mine used to code for a pretty sizable financial services companies. Processed a ton of loan and lease contracts and transactions for mortgage companies, automobile manufacturers, etc. Whole system was a huge legacy COBOL system on an IBM mainframe using flat files as storage.

Problem was that they were losing new potential business, because of those flat files. Companies started asking what kind of database they used, and they didn't have one. So they started using DB2.

But of course they've got god knows how many programs written to work with fixed-format records in text files, and they didn't want to overhaul the software to change something as fundamental as how data is accessed and updated. Too expensive, too risky.

So what do they do? Well, they start using DB2. And every table in the database? Has two fields per record. The first field: the name of the table. The second field: the original fixed-length record. They were essentially using a relational database like a slightly-smarter text file.

But now they could say they were using a modern database. And when a customer called up and requested they run a query on a database, they'd be told "Sure, we can do that, it'll take eight weeks and cost $XX,000." What? It's a freaking SQL query. "Well, let me tell you a story..."

This was years ago, so it's possible they've changed, but I doubt it.
posted by middleclasstool at 7:11 AM on August 9, 2012 [3 favorites]

Maybe we aren't very good at doing big things in concert with each-other?

The answer to that is better systems engineering. In the projects I've been involved in (software-related and not), the failures are almost never because someone did not write code properly. Failures happen because requirements are not defined well enough.

Poor requirements generate poor product because they are vague and can't be tested. As an example, here's something I've seen time and time again in the aviation world:

"The aircraft shall operate seamlessly across the modern worldwide airspace system."

What the hell does that mean? Seamlessly? Maybe that means the radios all automatically tune themselves every time you enter new airspace. Maybe it just means that the wings won't fall off when you're flying over the Atlantic. Hell, what does "operate" mean in this context? Give a requirement like this to a programmer and he or she will interpret it as best they can, but I guarantee you they won't implement it the way it was intended. And once the product is completed, how do you test it?

Partly this is program management meddling in engineering, and partly it's programmers meddling with the systems engineers. I was at a previous job as a "systems engineer", responsible for defining, you know, systems. How does this work, how does it integrate into the greater system, what are the failovers. We had software engineers who would look at the requirements, decide "I know a better way to do this", and then go off and code something completely different. Management's failure was to decide that it was more expensive to change the code than to change the requirements, so my job turned into "write down in words what the software engineer just bunged together".

To make good product, you really need a strong top-down approach that has managers (they are good for something!) actually ensuring requirements are being adhered to and not throwing them out willy-nilly because they cost too much. Overdefining a product happens, but in my experience it's far more likely that the unsexy but foundational aspects of a project will be undermined or thrown out completely because it's not what the user sees and pays for.

Anyway, that's my rant. Off to go listen to my program managers tell me why we "don't really need to test that, it's too expensive!" again.
posted by backseatpilot at 7:55 AM on August 9, 2012 [2 favorites]

The old adage stands true: Computers make very fast, very accurate mistakes.
posted by maryr at 8:00 AM on August 9, 2012

I am always reminded of something my father would tell me if were throwing a ball/frisbee/whatever and I ended up chucking it way off in the wrong direction: "It went exactly where you aimed it."
posted by backseatpilot at 8:08 AM on August 9, 2012

ArkhanJG: "I recently wrote an email-driven job-ticket system in a couple of weeks for my 4 man IT team, because we had no funds to buy one, and all the open-source ones I've tried over the years suck at email-based tickets."

Please explain to me the suckage within RequestTracker.
posted by pwnguin at 8:24 AM on August 9, 2012

backseatpilot: "I am always reminded of something my father would tell me if were throwing a ball/frisbee/whatever and I ended up chucking it way off in the wrong direction: "It went exactly where you aimed it.""

If you are the sole developer on a program or project that you have complete control over then, yes, you have only yourself to blame when it fucks up.

If you are working on a large project with many others in a business environment, then right after you throw a perfect knuckleball at the batter, the ball suddenly veers left and becomes a football in mid-flight just before the umpire announces that you are not playing baseball anymore but disk golf.
posted by double block and bleed at 9:53 AM on August 9, 2012 [3 favorites]

I agree that using agile methods is a big part of the way forward for software development. But that means not just the "how to build it" parts like automated testing, code standards, test-driven development, etc. Even more important are the philosophical underpinnings of agile. If you aren't familiar with the "agile manifesto" in the link, here are a couple of the items:

"Business people and developers must work together daily throughout the project." ... and yet selecting a smart, articulate actual user of the software to fly out and work with the team during development practically never happens. My wife uses some just god-awful health care software every day, and the people who selected it don't overlap with the users at all.

"Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely." ... and yet I constantly hear about places that "do agile" and have 60-hour weeks as the norm.

"Continuous attention to technical excellence and good design enhances agility." ... and yet getting architecture/refactoring stories prioritized in the backlog is usually a joke.

"The most efficient and effective method of conveying information to and within a development team is face-to-face conversation." ... and yet when I'm job-hunting, people want me to be "experienced in leading distributed teams", which usually means 12-hour delays in getting answers, language/communication issues, and trying to keep the morale of the local developers up even though I know that as soon as they get things documented better their jobs are going to Serbia.

Oh, and nth-ing the lack of attention to the need for UI designers. Lots of places hire a bunch of (for example) .NET programmers and have them code everything from the page logic to the business and data layers, but also the CSS/graphics, page flow and item placement, and so on. My experience is that the number of people who can code the C# stuff who would also say they are "artsy" and people-focused is really, really small. The few times I've had a budget for a graphic-y, design-y person who just sits there making icons and wireframes and thinking about how the app feels to use ... pure heaven.
posted by freecellwizard at 10:08 AM on August 9, 2012 [3 favorites]

People have been writing software since the 1950s, and arguably before that. That's over half a century ago, the equivalent of the Wright Brothers to supersonic jet aircraft. The "This is a young discipline" excuse for bad software has expired. Techniques for proper software engineering aren't a big mystery (detailed requirements, version control, code reviews, rigorous testing, quality-oriented culture), it's just that organizations are generally unwilling to put it into practice.
posted by LastOfHisKind at 10:59 AM on August 9, 2012 [1 favorite]

"People have been writing software since the 1950s, and arguably before that. That's over half a century ago, the equivalent of the Wright Brothers to supersonic jet aircraft. The "This is a young discipline" excuse for bad software has expired. Techniques for proper software engineering aren't a big mystery (detailed requirements, version control, code reviews, rigorous testing, quality-oriented culture), it's just that organizations are generally unwilling to put it into practice."

Yes. I agree and disagree with various comments above, but to some degree I think most of us are in agreement. Ironically, I strongly agree with some things that dubitable wrote, and feel like I want to agree with him generally, but the one thing we most disagree about is the one thing that I think is the absolute fundamental truth of this discussion. He wrote:

"Moreover, software development is not, cannot be and should not be considered engineering in the same way that building bridges, buildings, or circuits is. Fundamentally, software is malleable and pervasive in a way that no physical engineering discipline has adequately prepared us for. And while I agree that viewing software 'engineering' as an art form is appropriate really only for...artists (which I may even disagree with), there certainly should be a good deal of craftsmanship involved in building software."

I disagree with this as strongly as possible. Because what I think has happened is that the rapid historical development and deployment of computing technology has meant that software development internalized both horizontally and vertically a craft mentality and we've lived with the complexity and ambiguity that has resulted from this for so long that we've wrongly come to believe that this is due to some qualitative nature of software itself and not a historical accident.

People have rightly contrasted an ME's specification and use of a "#8 bolt" with an API. But an API could be as well-characterized as a #8 bolt. It's just that they're not because a) we've accepted that no such standardization exists or perhaps is even possible; and b) many individual agents prefer it this way because they like the flexibility of a lack of standardization (and they don't recognize the costs of this lack because those costs are widely distributed through the entire industry). People have talked about how software specs change and that's some qualitatively necessary feature. But it's not. You don't think that when cathedrals were built during medieval times that specs changed? They did. And it caused many of those cathedrals to fail. All the institutional and market forces that push software development in this messy, poorly-characterized direction exist for other engineering endeavors. It's just that with them, there's a long, long history where craft evolved to engineering and institutions and cultures changed as they learned how to build technology that was reasonably well-understood and reliable. They learned that it's worth it to be limited to standardized parts and it's worth it to be limited to industry-wide best practices and standard techniques and toolkits and it's worth it to spend a lot of time on deeply defining and analyzing requirements and it's worth it to rarely redefine them on-the-fly and it's worth it to have rigorous certification regimens for engineers and strictly defined sub-specialties and carefully designed manufacturing processes and engineering specialties devoted to manufacturing processes.

It's not at all the case that all the physical technology of the modern world — the gadgets and bridges and houses and roads and cables and vehicles, the factories and manufacturers and materials and suppliers and workers and managers, the customers and end-users — is any less complex and unpredictable by its nature than is software. Quite the reverse — real physical materials are far, far more complex and difficult to characterize, to standardize, than are logical constructs. Every argument about the complexity and ambiguity of software is much more true for the physical. And for the majority of human history, we lived with technology that was just about as reliable and predictable as software is today. Roads washed out, houses and bridges collapsed, farm implements failed, carts broke down.

I've argued in a thread recently here that while it's true that people did natural philosophy, and learned true things about the world, prior to modern science, it's nevertheless the case that modern science represents a true qualitative change in the practice of natural philosophy. In many ways pre-science natural philosophy is similar to science and it did learn things and understand things. But science is qualitatively special because of the whole of the changes in methods and institutions and culture. And that collectively makes it far, far more powerful and successful than natural philosophy ever was before science.

This is exactly comparable to modern engineering. Modern engineering is the whole of its methods and institutions and culture. It allows something that is extremely powerful that wasn't possible before. It allows us to build bridges that almost never collapse. And, yeah, that doesn't mean that they won't collapse due to unforseen environmental circumstances. But that was never the majority of the reasons bridges collapsed. Most of the times they collapsed, it was because one material was substituted for another. Or the quality of the material was less than it was thought to be. Or the quality of the build was less than it was thought to be. The characteristics of the whole of what went into the building of the bridge — the design, the materials, the methods used to build it, the workers who built it — was much less well-understood and controlled than they are now.

It's absolutely wrong to say that software couldn't be made this way. It can, and it should. But that will require a vast change in the culture of software development. Horizontal and vertical — it will require not just changes in practices in a particular shop, but the equivalent of a developed industrial base that supplies #8 bolts and uses #8 bolts.
posted by Ivan Fyodorovich at 2:22 PM on August 9, 2012 [6 favorites]

This^* is one of the few cases I know of where a large software project had zero bugs.

There are real, effective methods available for writing good, dependable code. They are expensive. Good luck getting a typical company that is focused only on their quarterly stock price to pay for them. Anything that doesn't contribute to the bottom line or mitigate a serious risk to it is seen as an unnecessary expense. Fat that begs to be cut.

Case in point: A company that designs a building that collapses has a greater liability problem than a software company that kills an equivalent number of people, if only because the jury can more easily understand the concrete idea of a structural engineer specifying the wrong kind of steel girders than the abstract idea of a programmer depending on a hash to return values in the same order as they did when he tested it. I just fixed just such a problem in someone else's code when a minor version change in Ruby changed its hash storage algorithm.

Writing software should be more like an engineering discipline than it is, but engineers don't have to worry about the physical properties of their materials changing in unexpected ways. Bolts may break, concrete may crack but steel doesn't turn into aluminum. The guy who wrote that code was a dope, but there are many other ways that sort of thing could happen to even the best programmers.

* - PDF link to scientific article, somewhat poorly translated from French.
posted by double block and bleed at 3:22 PM on August 9, 2012

"Writing software should be more like an engineering discipline than it is, but engineers don't have to worry about the physical properties of their materials changing in unexpected ways. Bolts may break, concrete may crack but steel doesn't turn into aluminum."

Except that it does outside of engineering/industrial/manufacturing culture. Not being able to rely upon the quality of parts from suppliers, the quality of materials from suppliers, the quality of manufacturing from suppliers — that's all part of what a mature engineering/industrial/manufacturing culture eliminates and which a crafting culture allows.
posted by Ivan Fyodorovich at 3:29 PM on August 9, 2012 [1 favorite]

I mean, seriously, yours was a bad example because inferior material substitution has always been a problem — bolts break and concrete cracks often because the physical properties of the materials are not what they were supposed to be because the materials are not what they're supposed to be. And there's a huge complex system designed to prevent this from happening specifically because there's lots of incentives for various agents in the supply chain to substitute inferior materials. Most of these controls don't exist in software development.

And you're right — a very large part of why this is the case is that no one wants to pay for it. In many cases, these things exist in engineering because of cooperation between government and industry. It's a tragedy of the commons situation where there need to be external incentives to do these things because, without them, it's competitively counterproductive for individual agents to do so. But everyone benefits with reliable, standardized materials and parts, industry-wide inspection regimens, established contract and tort law, and all the rest. This is all stuff that industrial development has learned over the last three hundred years and software development has mostly ignored.
posted by Ivan Fyodorovich at 3:42 PM on August 9, 2012

6) New build goes into testing 10 am eastern and it is fast as hell, Lead realizes that offshore devs are all calling him dear because they think his name is female.

I am sad to report that it works both ways. I was once on an east coast team which had some QA contractors on the west coast. I was assigned one whose name was Surya. I had recently decided that there seemed to be a near universal rule that first names that ended in -a or -ya were female (eg Anya, Maria, Ankita), so it was pretty safe to assume that it applied universally. You can see where this is going. There was a chat room for devs and QA to talk about features and bugs, etc. The lead QA guy was named Mike. One day the chat went something like this:

Mike: So, what other things should Surya be trying with (new feature X).
Me: I think she should focus on (thing Y that just got deployed to staging).
Mike: Uh, Surya's a guy.
Me: ... [attempting to recover gracefully and totally failing.]
Mike: If it helps, I'm a guy too.
posted by A dead Quaker at 6:09 PM on August 9, 2012 [1 favorite]

Poorly defined and rapidly shifting requirements, complete isolation of developers from users and very low or no resources committed to testing and especially usability are just some of the problems.

Oh. I thought it was just us who worked that way.

The layer we really need to fix is middle managers who have little or no knowledge of IT (or anything else, apparently) who think they're qualified to manage software projects or implementations. I see this breeding a culture of fear in our organization, because the new folks are smart enough to realize they're going to be tasked with cleaning up the mess left by the current group of idiots, and that upper management is going to hold the new people responsible for daring to try to replace outmoded and broken systems.

The only bright spot is that they're trying to implement engineering-style protocols for project management. If that means they're going to stop appointing incompetent project managers, I'm all for it, but I'm not holding my breath.
posted by sneebler at 6:54 PM on August 9, 2012

On a related note, there's a trend away from hiring trained, experienced testers these days; now you see ads for "developers in test" or a tester who is also expected to be an automator (a/k/a a full-fledged developer).

Oooh, that never ends well. A programmer with a testing manual is as scary as one with a screwdriver.
posted by MartinWisse at 2:43 AM on August 10, 2012

I've spent almost my entire working life working in or supporting manufacturing. I know that shoddy bolts made in unknownCountry can and do break. I guess the point that I'm apparently not stating very well is that engineers have better regulation of their profession than programmers do. Software is built on layers of frameworks that can change the behavior of dependent code in the future, long after the programmer is finished with the project. Poorly made bolts can break, but the gravitational constant doesn't change.

State law pretty much everywhere requires that the structural engineer who signs off on a building design must be a Professional Engineer licensed by that state after gaining significant experience and passing a rigorous test. If one of them is negligent, he or she could lose their license as well as be liable in a civil or even criminal case. I have several friends and former coworkers who are professional engineers in Chemical Engineering. None of them take putting their stamp and signature on a project lightly.

Few states have equivalent requirements for software engineers. I wrote code that directly controlled (and probably still does control) critical steps in the manufacture of a pharmaceutical taken by millions of people. Even though my work was rigorously reviewed by many other people in that highly regulated industry, I did it all without so much as a bachelor's degree, much less any kind of professional certification. That simply wouldn't have flown if I had designed a bridge. I sleep better now that I do web development in a non-critical industry.

Someone upthread said that since software engineering is over fifty years old, that nullifies the excuse that the field is still too young. I don't agree with this. Engineering evolved and standardized over the course of centuries. Buildings and bridges still fail once in a while. Middle managers have much less power to dictate to a real engineer. The engineer can say "no, that's unsafe/unreliable/illegal" and refuse to do it. A programmer has much less of a body of knowledge or regulations to cite to support such a refusal. Is it any surprise then that software is even more prone to failure?

As a semi-lurker, I'm really excited to engage in a discussion with one of my favorite MeFites. I miss your 0.25 Treaty of Westphalia length comments from days of yore when we both had differnt names :)
posted by double block and bleed at 6:17 AM on August 10, 2012 [1 favorite]

"Someone upthread said that since software engineering is over fifty years old, that nullifies the excuse that the field is still too young. I don't agree with this. Engineering evolved and standardized over the course of centuries. Buildings and bridges still fail once in a while. Middle managers have much less power to dictate to a real engineer. The engineer can say 'no, that's unsafe/unreliable/illegal' and refuse to do it. A programmer has much less of a body of knowledge or regulations to cite to support such a refusal. Is it any surprise then that software is even more prone to failure?"

I both agree and disagree. It's certainly a valid point that engineering developed all these practices and its culture over, I don't know, three times longer a period of time? But that's the high end of the estimate and it's not that much longer. And more to the point, it serves as an example of how things should be done, it's a template for implementing the needed changes in software. And the nail in the coffin: almost none of these changes are being made, nor is there much of an impetus for them.

That's what's damning.

And the explanation I have for this is that it's not just short-sighted management. That's the bigger part of it, sure. But it's also that there's almost no support within the programming culture for these changes, either. And that's for numerous reasons, from self-interest (state law requiring licensing when a large portion of programmers don't even have formal educations in the field?) to the inertia of cultural conservatism and just general resistance to a high degree of institutional/cultural structure. Worse, the customers don't understand that there's any possibility for improvement — people just accept that software is unreliable compared to other technology. There's people who work very hard, for whatever reasons, at convincing the public that This Is Just How Things Are because Software is Unique. It's a huge collusion of interests that don't want the changes or don't know they're possible.

But that will change, sooner or later. As someone upthread speculated, it may not change until there's a rash of very high-profile, very disastrous software failures that destroy lives and actually kill many people. That will happen if things don't change because one thing that the "99% of software doesn't need to be that reliable" argument fails to take into consideration is that the increasing ubiquity of software means that it's massively interrelated in a systems sense, and this includes through hardware strata, too. It's becoming increasingly possible for failure cascades to begin in trivial components and spread into mission-critical components.

Or even in a more roundabout sense — a user-interface failure, such as a burned-out bulb that indicates stowed airliner landing gear, can preoccupy the flight crew enough to distract them from their slow loss of altitude into an eventual crash in the Everglades. That's a famous crew-resources management failure more than an engineering failure, of course; but it illustrates a dynamic that will increasingly dominate as we rely upon our technology to orient us within the context of understanding which systems are important and which are much less important. When everything is mediated by a highly complicated user-interface, then apparently trivial UI failures can escalate to mission-critical failures. That's even when the only intermediation is the user. But of course there's increasing direct interaction between software systems.

Only 1% of software is directly mission-critical. But much more is indirectly critical in ways we likely won't anticipate. And while that 1% will always need much more attention, the rest will need much more attention than it's getting now.
posted by Ivan Fyodorovich at 4:04 PM on August 10, 2012

I disagree with this as strongly as possible. Because what I think has happened is that the rapid historical development and deployment of computing technology has meant that software development internalized both horizontally and vertically a craft mentality and we've lived with the complexity and ambiguity that has resulted from this for so long that we've wrongly come to believe that this is due to some qualitative nature of software itself and not a historical accident.

Ivan Fyodorovich, I think we are far more in agreement than disagreement. I think at its core, you are arguing for a rigor and discipline that software sorely needs. I think perhaps when reading my comment, it seemed as though I was arguing that software development should not be an engineering discipline, but that's not what I intended to say.

I think mainly what I'm trying to point out is that software's strength is in its flexibility, and in the flexible ways that humans can interact with it. These two facets provoke unique challenges when trying to approach software as engineering. And I think we, as software developers, haven't been able to agree yet on the right way to handle these things in a disciplined fashion. But it will come.

Perhaps a point on which we may not ever agree is that specifications, written long in advance and fixed into place, are unworkable in software, and don't sufficiently leverage software's strength: its flexibility. In my experience as a software developer, I have had far more success using the strategies laid out in the agile software manifesto when compared to using waterfall methods, which it seems you are espousing.

Using the techniques that have come out of agile software development have helped me precisely because they have allowed me to build software more rigorously, safely, and in a disciplined fashion, while allowing for rapidly changing specifications. The constant communication with the humans who will use the software means that the software is far "closer to spec" (by spec, I mean reflecting the true needs of the users) than it would be with a long up-front design process leading to developers locked away in their caves for months on end, finally to burst out with the finished widget: "Wait, this wasn't what you wanted? Well, shit..." And by providing software which is closer to what the user really needs, we eliminate the disappointment and frustration of a badly done project, and we eliminate all the bugs that come out of shoe-horning features into software at the last minute--this is what you often get in software when a project fails: rather than a collapsing cathedral, you get a frankenstein application which gives the illusion of working but is fundamentally flawed...but which people keep using.

So I'm not arguing that software is different because I want to be a code artist. And I think we are misunderstanding each other when we use the word craft, or perhaps craftsmanship: to me it means a disciplined, consistent approach to doing one's job. Whether there is an aesthetic aspect is arguable; but it is certainly the case that well-written code has less bugs, is easier to understand and change, and this has direct impact on efficiency and usability.

Related to this, there is another aspect of software that is quite different than other engineering disciplines: the individuals designing the system are the ones building the final product as well. How do we approach this with rigor and consistency? There is most certainly, at this stage in the history of software development, a place and a need for craftsmanship. And because separating the designing from the implementing would be a rather inefficient (if not impossible) enterprise in software development, I don't see this changing. However, I most certainly see a set of rigorous practices put in place based on metrics, which all developers will be expected to adhere to, once we've consistently codified the aspects of software development that matter (in my mind this may largely involve test coverage, both unit and integration, but perhaps could incorporate automated profiling measurements as well).

I wouldn't be opposed to something codified into a standard where developers would be expected to pass a certain set of requirements to be certified to do their jobs at a certain level. Frankly, I'm all for it.

To address something else you brought up in a previous comment in this thread, I agree that we need our form of the "grade 8 bolt" in the software world. It's obvious to anyone who has been programming professionally and takes their profession seriously that the mess of language paradigms, APIs, libraries and protocols we have today is badly in need of sorting out, with a significant chunk that should probably just be taken out back and shot pre-emptively. And the influence of market forces is in many ways a terrible hindrance to producing the kind of consistent, safe, reliable toolset which you are proposing. Many developers want the kind of thing you're talking about but are working at cross-purposes with business interests to get consistency across platforms. Just look at browser implementations for possibly the most recent heinous example of this--and that mess has been going on for fifteen years or so.

So all of this is just to say that we haven't figured it out yet. I disagree with your assessment that we've had plenty of time and should have gotten our shit together at this point, merely by mimicking the example that previous engineering disciplines have provided us. There are significant differences between what software development is and what other engineering disciplines are, for a variety of reasons, and these differences need to be properly understood and extricated from the corrupting influence of market forces, among other things. We are in a maelstrom phase of sorting the wheat from the chaff while we formalize processes. And while we, as software developers, try to find our way through this jungle, out of necessity those of us who give a shit adopt craftsman-like approaches to building software--perhaps because it's all we have to latch onto that can produce reasonably good software, or perhaps, because we are the ones both designing and building our bridges and airplanes, both engineering and craftsmanship are required.

But it's not because we eschew responsibility, or want to reject discipline in favor of being "creatives." I have a visceral reaction of disgust at that idea, one that comes from working with developers who thought they could get away with producing crap and hiding behind some kind of "coder-as-artist" persona. And I suspect this is a similar feeling to yours.
posted by dubitable at 8:31 PM on August 10, 2012

It's obvious to anyone who has been programming professionally and takes their profession seriously that the mess of language paradigms, APIs, libraries and protocols we have today is badly in need of sorting out, with a significant chunk that should probably just be taken out back and shot pre-emptively.

As somebody who no longer programs professionally, it's obvious to me that every single language paradigm, API, library and protocol in use today exists because it met some need for somebody somewhere. It also seems obvious to me that the resulting huge mess is now more like an ecology than anything else in nature, and that deliberately driving certain parts of it to extinction will cost unimaginable amounts in replacement code and not necessarily improve the overall picture.

Sure, there are a lot of #8 bolts out there that are cross-threaded by design. That doesn't stop a skilled development team producing robust results by using them, nor evolving toward metric in a disciplined and gradual fashion.

I think those people arguing most vociferously for software engineering to work more like "real" engineering have a rather rose-colored view of "real" engineering.

It also seems to me that creating a conceptual opposition between engineering and craftsmanship is the same kind of wonky thinking that has resulted in ISO 9001 being so egregiously mis-applied in so many organizations. It really doesn't matter how astoundingly flawless an organization's quality policy is; put implementing it in the hands of people who don't care about what they do, and the results will be crappy. In software engineering, as in engineering generally, the quality of your people matters much more than anything else.
posted by flabdablet at 10:47 PM on August 11, 2012

"In software engineering, as in engineering generally, the quality of your people matters much more than anything else."

That's flat wrong. I mean, it's completely, egregiously wrong. People are what people are. There are no more and no fewer talented and conscientious people making stuff today than there were four hundred years ago. But things are (with some classes of exceptions) better built and more reliable today than they were then. That's because institutions and practices changed, not people.

If producing quality, reliable software means chasing after the small minority of the more talented programmers, then the majority of software will always be low-quality and unreliable. If you want the majority of software to be high-quality and reliable, then you can only change the institutions and practices involved in producing software so that mediocre programmers will suffice. This is exactly the difference between crafting and engineering — the value of the former lies primarily in the particular context of the craftsperson and the item they make; the value of the latter lies primarily in the institutional, collaborative context in which the item is made.

"I think those people arguing most vociferously for software engineering to work more like 'real' engineering have a rather rose-colored view of 'real' engineering."

No. Despite the dubious nature of doing this, let's separate out the layered supply/manufacturing chain from the design chain in the context of mechanical engineering.

So, fire up a copy of AutoDesk Inventor (and, for that matter, think about it in the context of its larger suites, like Product Design Suite and Factory Design Suite). Now start looking at the stuff. The databases of standard parts. The performance specs on all those parts, how they are determined and how those specs are maintained and by whom and how they are disseminated to designers. How engineers are trained in how this stuff, as materials and parts, work and work together and how they should and shouldn't be used together. (I don't mean theory, the engineers aren't trained as either physicists or materials scientists. Practically no one reads Knuth anyway; but, regardless, there's an empty wasteland between that territory and the "ecology" of innumerable and ulta-specific software components and tools.) How the parts are standardized and even components are standardized and even flexible solutions to common problems, from parts through components to implementation, are standardized.

Now consider the supply/manufacturing side of this and what it implies. That those parts are almost universally and easily available. That their quality of manufacturing is consistent. That their shipping and delivery is largely reliable. That assembly and testing is reliable.

The stuff sitting around you in your home and office is far more complex than is the software you're using, though a lot of its complexity is hidden in the creation and refining of its materials, the interworkability of its standardized parts — the larger portion of its complexity is hidden in the industrial base which produced it.

But insofar as software is high engineering — and people keep claiming that it is, and I agree that it's becoming that and we want it to be that because it will be incredibly valuable when it is that — then it cannot be as high-quality and as reliable as the rest of our stuff until it is the product of an equivalent industrial base. That base is cultural more than it is anything else; it's the reification of science and engineering. It is institutions and practices, not individual dedication and talent.
posted by Ivan Fyodorovich at 5:29 PM on August 12, 2012 [1 favorite]

That's flat wrong. I mean, it's completely, egregiously wrong. People are what people are.

Sure. And some of us are astoundingly good at what we do, and some of us completely suck at it because we don't care, and most of us are somewhere in between.

There are no more and no fewer talented and conscientious people making stuff today than there were four hundred years ago.

I never claimed otherwise.

But things are (with some classes of exceptions) better built and more reliable today than they were then. That's because institutions and practices changed, not people.

I don't think it's reasonable to make a conceptual separation between "institutions and practices" and the people who comprise the institutions and do the practices.

The point I'm trying to get across is that engineering, as a discipline, is not magically immune to being done badly by people who think they're better at it than they really are. I have yet to see any set of institutions or practices that remain robust in the face of dedicated and hardworking incompetence.

I don't believe there's any significant difference in the quality distribution of software vs. "the rest of our stuff"; 90% of everything is crud. And this is because the whole point of Gustav Eiffel or Seymour Cray or Dennis Ritchie or Jony Ive is that not everybody is them. Most of us do our best, which is OK but not that great. Some of us, to the detriment of the rest, are Bjarne Stroustrup.

insofar as software is high engineering — and people keep claiming that it is, and I agree that it's becoming that and we want it to be that because it will be incredibly valuable when it is that — then it cannot be as high-quality and as reliable as the rest of our stuff until it is the product of an equivalent industrial base.

The main thing that leads me to believe that software is to some extent irreducibly non-engineerable is the nature of the systems that the engineered product is required to interact with. Physical engineering makes things that interact with physical reality, and physical reality has a certain consistency as well as becoming inexorably better understood over time. Software engineering makes things that interact with human processes, and especially in the realm of the "huge, creaky application" those processes are subject to arbitrary change in all kinds of unpredictable ways.

It seems to me that the main thing driving the obvious improvement in physical engineering over time has been improvement in the underlying materials science and the consequent improvement in tools and techniques. But software simply has no parallel to materials science; it's processes all the way down.
posted by flabdablet at 12:24 AM on August 13, 2012

I figure the main problem mapping traditional engineering over to software is the notion of tolerance. When you buy a M8 ISO standardized bolt, you're not buying a bolt with a diameter of 7.8 units. you're buying something between 7.760 and 7.972. Your design needs to be able to cope with a 2 percent margin of error, or perhaps 4 percent, if your bolt is going into a nut of similar tolerance.

Physical systems can cope with being off by one micron or atom. The most infamous kind of error among programmers is known as the "off by one" error. Even the least wrong you can be (without being right) is insufficient. Viewed in this light, it's a comparative miracle any software works ever.
posted by pwnguin at 6:09 PM on August 13, 2012 [1 favorite]

Software engineering does have an analogue to margins of error. But like all things software, it works until it doesn't.
posted by flabdablet at 9:00 PM on August 13, 2012

« Older The Internet Archive releases a torrent of... | Everything Counts in Large Amounts Newer »

This thread has been archived and is closed to new comments

MetaFilter

Huge, Creaky Applications
August 8, 2012 10:11 PM Subscribe

Tags

Share

Huge, Creaky Applications August 8, 2012 10:11 PM Subscribe

Tags

Share

Huge, Creaky Applications
August 8, 2012 10:11 PM Subscribe