A normal failure
April 23, 2019 6:45 AM   Subscribe

How the Boeing 737 Max Disaster Looks to a Software Developer (Gregory Travis, IEEE Spectrum)

Boeing produced a dynamically unstable airframe, the 737 Max. That is big strike No. 1. Boeing then tried to mask the 737’s dynamic instability with a software system. Big strike No. 2. Finally, the software relied on systems known for their propensity to fail (angle-of-attack indicators) and did not appear to include even rudimentary provisions to cross-check the outputs of the angle-of-attack sensor against other sensors, or even the other angle-of-attack sensor. Big strike No. 3. None of the above should have passed muster.
posted by Johnny Wallflower (167 comments total) 75 users marked this as a favorite
 
In this case, the comments are very much worth reading.
posted by Johnny Wallflower at 6:45 AM on April 23 [6 favorites]


The title, I confess, makes me break out in hives. The article might be rich and nuanced, but the vague implication that someone from the not-at-all-prone-to-failure software industry wants to provide insight into an industry that's (previously) been held up as the standard to which software should aspire, well…
posted by Going To Maine at 7:01 AM on April 23 [21 favorites]


not-at-all-prone-to-failure software industry wants to provide insight into an industry that's (previously) been held up as the standard to which software should aspire, well…

Have no fear. His basic message is that the MAX failure is a result of software engineering culture slipping its way unimpeded into aircraft engineering.
posted by Tell Me No Lies at 7:11 AM on April 23 [90 favorites]


the vague implication that someone from the not-at-all-prone-to-failure software industry wants to provide insight into an industry that's (previously) been held up as the standard to which software should aspire, well…

He's also a pilot! (Although not commercial.) The article is incredibly well-written, although I couldn't finish it because that would probably mean I would never fly again.
posted by schwinggg! at 7:16 AM on April 23 [5 favorites]


He's also a pilot! (Although not commercial.)

Indeed, a considerable part of his point is that he installed an autopilot system on his Cessna 172, which performs instrumentation cross-checking that the 737MAX apparently didn't and, more importantly, required any pilot of the aircraft to specifically train for the difference, including how to manually disable the system. (And on top of that, he notes, if the autopilot on his Cessna does something dangerous, he can force the controls the other way due to manual linkages, where the hydraulically-operated control surfaces of the 737 were stronger than the pilots that tried to pull the plane up.)

I have no way of knowing, but I suspected the "analysis" of the second crash that led to the FAA grounding the 737MAX fleet, at last, was a preliminary listen-thru of the cockpit voice recorder, in which is was clear that the pilots were again fighting to pull the plane up from a dive the automated control system -- which operates even when the autopilot is disabled! -- put the plane in. And it probably sounded just like the previous crash.
posted by Gelatin at 7:24 AM on April 23 [11 favorites]


The author also talks about how his Cessna's autopilot required more certification and more documentation to install than the Max's MCAS, and is safer by design.
posted by RobotVoodooPower at 7:25 AM on April 23 [8 favorites]


Unexpected item in flying area!
posted by thelonius at 7:40 AM on April 23 [15 favorites]


A day or two after the Ethiopian crash, I was taking a cab after a celebratory dinner here in Washington DC and this topic came up in conversation with the cab driver. We had a friendly debate speculating as to the cause of the crash. He took the position that it was a software problem with Boeing almost completely at fault whereas I opined that perhaps the pilots should have been faster to disengage the autopilot. As the author of the article here notes, his personal aircraft has "instructions on how to detect when the system malfunctions and how to disable the system, immediately. Disabling the system means pulling the autopilot circuit breaker." Indeed, when I bought my plane, which has an autopilot, the first thing I was trained in regarding the autopilot was how to turn it off (either of two switches or, failing that, the circuit breaker) in case it did something I didn't like. It's very regretful that many pilots transitioning to the 737MAX apparently were not provided sufficient training in this regard.

Anyway the cab driver asked me if I was a pilot. Yes in fact I was. "But are you a commercial pilot?" "I have a commercial rating but I don't fly commercially." Then the driver informed me that he once flew the Mig 23 for Ethiopia. Only in DC! He was impressed and slightly concerned that I had plans to fly Mrs. exogenous and myself up to Boston the next day and kept referring to the reassurance provided by an ejection seat.
posted by exogenous at 7:42 AM on April 23 [102 favorites]


Software engineering is a nice idea.
posted by ZeusHumms at 7:51 AM on April 23 [18 favorites]


Of course they knew how to turn off the auto-pilot. However, as the article stated "MCAS is implemented in the flight management computer, even at times when the autopilot is turned off."
posted by Tell Me No Lies at 7:51 AM on April 23 [22 favorites]


Then the driver informed me that he once flew the Mig 23 for Ethiopia. Only in DC!

Right? This is one huge loss caused by ridesharing. I remember hearing somewhere that a significant percentage of the Afghani poetry community were DC cabbies.
posted by schwinggg! at 7:55 AM on April 23 [4 favorites]


Meanwhile, the 787 Dreamliner is in trouble, with whistleblowers pointing out dangerous lapses in quality control. These affect mostly jets assembled at Boeing's new South Carolina plant, a site which was chosen because of the union-free low-cost labour force, and despite the lack of an existing aeronautical engineering skill base. To preserve the benefits of this, management reportedly banned the workers there from liaising with the (unionised) workers in Washington state, lest they contract the contagion of unionism.
posted by acb at 7:55 AM on April 23 [60 favorites]


However, as the article stated "MCAS is implemented in the flight management computer, even at times when the autopilot is turned off."

And of course the pilots should be trained to turn it off immediately if it misbehaves.
posted by exogenous at 7:57 AM on April 23 [1 favorite]


The airframe, the hardware, should get it right the first time and not need a lot of added bells and whistles to fly predictably. This has been an aviation canon from the day the Wright brothers first flew at Kitty Hawk.
The author simplified this a bit: The Wright brothers designed their planes with dynamic instability because they believed that they needed it in order to provide full maneuverability. It took almost a decade for other aircraft designers to figure out how to get an airplane which was dynamically stable while also have the maneuverability of the Wright Flyers. (Source: Anderson's "Introduction to Flight".)
posted by clawsoon at 7:58 AM on April 23 [12 favorites]


Going to Maine: ...the vague implication that someone from the not-at-all-prone-to-failure software industry wants to provide insight into an industry that's (previously) been held up as the standard to which software should aspire, well…

This comment is fascinating to me — I’m a software person and had assumed that everybody reading the news coverage had also understood that it was faulty software that ultimately caused these crashes. Is this less common knowledge than I had thought?

The people who wrote the code for the original MCAS system were obviously terribly far out of their league and did not know it. How can they can implement a software fix, much less give us any comfort that the rest of the flight management software is reliable?

Precisely. I had been baffled by the news coverage after the first crash -- Boeing promised a software update, and all would be good? Why hasn't mainstream coverage been making this author's concluding point much earlier and more loudly?

It is likely that MCAS, originally added in the spirit of increasing safety, has now killed more people than it could have ever saved. It doesn’t need to be “fixed” with more complexity, more software. It needs to be removed altogether.
posted by Metasyntactic at 8:06 AM on April 23 [7 favorites]


It is likely that MCAS, originally added in the spirit of increasing safety, has now killed more people than it could have ever saved. It doesn’t need to be “fixed” with more complexity, more software. It needs to be removed altogether.

The problem is, from Boeing's perspective, that without MCAS, the super-large engines, designed to meet customer fuel efficiency demands, affect the 737MAX's aerodynamics in a way the FAA wouldn't accept. The engines are so large they act as a lifting body of their own, and the entire system tends to point the nose up too much on power-up, beyond the point the FAA would consider acceptable.
posted by Gelatin at 8:11 AM on April 23 [11 favorites]


Great article, thanks for posting. Clear explanations of the issues involved.
posted by clawsoon at 8:14 AM on April 23 [2 favorites]


I read some (seemingly informed) speculation that even with the MCAS successfully turned off, the pilots might have needed an unreasonable amount of strength to fight the control surfaces into the right position manually. So even the manual override might have been wholly insufficient.
posted by BungaDunga at 8:25 AM on April 23 [1 favorite]


ZeusHumms: "Software engineering is a nice idea."

I have a nicely framed degree that says that I have a masters degree in software engineering but after twenty years in the industry, I can pretty safely assert that no such discipline actually exists.
posted by octothorpe at 8:28 AM on April 23 [54 favorites]


The problem is, from Boeing's perspective, that without MCAS, the super-large engines, designed to meet customer fuel efficiency demands, affect the 737MAX's aerodynamics in a way the FAA wouldn't accept. The engines are so large they act as a lifting body of their own, and the entire system tends to point the nose up too much on power-up, beyond the point the FAA would consider acceptable.

The solution here is pretty simple: Don't put those massive engines on the 737. Design a new plane instead. It will cost more to do that, but it's not as though Boeing is a cash-strapped startup. The idea that high costs somehow justify forcing poor innocent Boeing to sell a half-assed fix to an airframe that they know is dangerously unstable is bullshit.
posted by Uncle Ira at 8:28 AM on April 23 [45 favorites]


>> However, as the article stated "MCAS is implemented
>> in the flight management computer, even at times when
>> the autopilot is turned off."
>
> And of course the pilots should be trained to turn it off
> immediately if it misbehaves.


Fair enough, but from the article at least that seems to mean you would need to shut down the entire flight management computer. And maybe that’s reasonable; I don’t know enough to say.
posted by Tell Me No Lies at 8:38 AM on April 23


If I remember correctly, it's not the high costs of designing a new plane, it's that Boeing was able to sell the 737MAX to customers without those customers required to do more extensive and expensive pilot training requiring a simulator. If Boeing had made changes that would have done so, the customers said they would go buy an Airbus instead. I'm not saying that was the right thing to do, it's just the choice wasn't between the high cost of a new design and this deadly compromise, but figure out a way to make it work like that or not have very many purchases for the plane at all.
posted by foxfirefey at 8:40 AM on April 23 [6 favorites]


There's a programmer out there who, because they didn't think about system redundancy in a life-safety situation, should be charged with 350+ counts of involuntary manslaughter.
posted by seanmpuckett at 8:42 AM on April 23 [9 favorites]


I also think traffic engineers should be charged when people die on the roads they design, but I'm funny that way.
posted by seanmpuckett at 8:42 AM on April 23 [6 favorites]


A quick comment on the author's comparison to the Challenger disaster. He writes:
I cannot get the parallels between the 737 Max and the space shuttle Challenger out of my head. The Challenger accident, another textbook case study in normal failure, came about not because people didn’t follow the rules but because they did. In the Challenger case, the rules said that they had to have prelaunch conferences to ascertain flight readiness. It didn’t say that a significant input to those conferences couldn’t be the political considerations of delaying a launch. The inputs were weighed, the process was followed, and a majority consensus was to launch. And seven people died.
This is only true according to a robotic, nonsensical interpretation of the rules. Challenger never should have launched; the solid rocket boosters were only qualified for flight down to 40°F; the overnight lows before launch were about 20° cooler than that. Even if the SRBs were capable of functioning in colder temperatures, they weren't certified to do so. Engineers at Thiokol raised this point; there was pushback that 40° was the cutoff for propellant mean bulk temperature. A Thiokol employee raised the point that this was an asinine interpretation of the rules:
You could expose that large Solid Rocket Motor to extremely low temperatures — I don't care if it's 100 below zero for several hours — with that massive amount of propellant, which is a great insulator, and not change that propellant mean bulk temperature but only a few degrees, and I don't think the spec really meant that.
A good comparison/contrast to the 737 Max disasters is a different Challenger mission, STS 51-F in 1985. Failed sensors sent bad readings that caused the automatic shutdown of one of the orbiter's main engines. Additional failed sensor readings put another main engine within seconds of a shutdown. However, an observant and fast-acting worker in mission control made the call to override the sensor limits and an abort to orbit was called. (I.e., the crew of Challenger aborted the originally planned mission profile and launched into a lower orbit on the remaining two engines.)

Had there been no way to override the failed sensors, a second main engine would have gone down and the shuttle crew would have attempted landing in Spain on a high-speed transatlantic abort. Because humans could and did override the flawed sensors, high-risk automated behavior (unnecessary engine shutdown) was avoided.
posted by compartment at 8:45 AM on April 23 [15 favorites]


A friend sent me this piece yesterday, and it's both illuminating and disturbing.

Reading through it, and from talking to people who are closer to the aviation world than I am, the largest issue is that Boeing, like all too many companies, is, essentially, infected with the programming "we can just patch this lethal bug" mindset. While it's probably possible to fix lots of problems with better code, the better question to ask before the code is written is "how is this code going to interact with people."

Fundamentally, this is the same problem that applies in autonomous driving: what happens when the car (or the plane) doesn't have good information, and the operator can't intervene. It's a problem that should terrify anyone thinking about it, and it certainly didn't scare Boeing enough.
posted by Making You Bored For Science at 8:48 AM on April 23 [12 favorites]


It is likely that MCAS, originally added in the spirit of increasing safety, has now killed more people than it could have ever saved. It doesn’t need to be “fixed” with more complexity, more software. It needs to be removed altogether.

This is incorrect. The MCAS was added to make the 737MAX behave like a 737NG to avoid the recertification process for plane, pilots and airlines, because this was what the airlines were demanding. The plane flies perfect fine without it, it just doesn’t fly like 737.
posted by jmauro at 8:50 AM on April 23 [10 favorites]


Blah, blah, blah, let's not incentivize corporate leaders to focus on short-term profits at the expense of long-term viability. It's time to bring back civic responsibility as part of the corporate charter.
posted by Mental Wimp at 8:53 AM on April 23 [6 favorites]


and, more importantly, required any pilot of the aircraft to specifically train for the difference, including how to manually disable the system. (And on top of that, he notes, if the autopilot on his Cessna does something dangerous,

one of my neighbors is a recently retired commercial pilot who, up until two years ago, owned his own airline, private clients mainly, flying state of the art jets all over the world. His immediate response to the 737 "problem" (without having access to anything but what was in the news), was two words: pilot error.

"But doesn't the similarity in the situations suggest something else?" I said.

At which he softened a bit (he's a pretty hard guy) and said, "Make that training failure. We've got way too many people in the skies these days who are more software experts than pilots. They're fine until the software malfunctions, and then they lack experience. They panic." And then he went on to tell me how hard it had become at the end of his career to find young pilots who were fully up to the job. "I finally just assumed they'd be ballast for the first few years, until I'd made them fly enough hours unassisted (by auto-pilot)."

In the end though, he said, "It's still way safer up there than it used to be."
posted by philip-random at 8:55 AM on April 23 [10 favorites]


one of my neighbors is a recently retired commercial pilot who, up until two years ago, owned his own airline, private clients mainly, flying state of the art jets all over the world.

Is your neighbor Professor Xavier and does his airline transport X-Men on highly modified SR-71 jump jets
posted by compartment at 9:06 AM on April 23 [13 favorites]


The solution here is pretty simple: Don't put those massive engines on the 737

As a professional engineer who has to do the whole licensing and ethics rigmarole in order to practice, I'm as condescending towards the fast and loose "software engineering" field as anyone. That said, there's no reason why MCAS couldn't work. There are plenty of examples of aircraft, spacecraft, and other machines that could not function without sophisticated software working constantly to keep an unstable system in control. It just takes a ton of Quality Assurance, testing, and redundancy - all of which appear to be lacking in the 737-max program.

I think a big angle to this story is that when it comes to flight control software, the FAA appears to be just as clueless as the programmers.
posted by Popular Ethics at 9:10 AM on April 23 [17 favorites]


Fantastic article--thanks for posting!
posted by sperose at 9:10 AM on April 23


There's a programmer out there who, because they didn't think about system redundancy in a life-safety situation, should be charged with 350+ counts of involuntary manslaughter.

Putting the blame on a single programmer won't get you the results you want. Problems like this are systemic failures, and it's not actually possible for any one person to think through all the ramifications of their design.

Wanna put the management who created the systemic issues in the first place in the dock? Right there with you.
posted by asterix at 9:11 AM on April 23 [71 favorites]



Is your neighbor Professor Xavier and does his airline transport X-Men on highly modified SR-71 jump jets

no, but if even half of his stories are true, he was pretty much on that level. One of his clients required a weekly return flight NYC from Singapore. Only ever one passenger. $60,000/hr.
posted by philip-random at 9:22 AM on April 23 [5 favorites]


This letter (PDF warning) articulates better than I could why the "bad apple" theory doesn't work in situations like this. (Note that one of the writers is a pilot.)
posted by asterix at 9:23 AM on April 23 [2 favorites]


Putting the blame on a single programmer won't get you the results you want. Problems like this are systemic failures,

in my limited experience, it usually starts with some guy in sales who has no idea what the software team can/can't do, so he just assures the client, no problem, then hopes he's right. And if he's wrong, well, in a properly sloppy organization, then it's somebody else's problem.
posted by philip-random at 9:26 AM on April 23 [14 favorites]


I've been following the discussion of these crashes on airliners.net. It's kind of disturbing that a sizeable minority of pilots are determined to put the crashes down to pilot error even though the evidence seems to suggest that the pilots followed the correct protocols and only abandoned them when they didn't work.

Personally, I want to travel in an airplane that the worst pilot can fly safely.
posted by night_train at 9:28 AM on April 23 [20 favorites]


This is a great article, and I'm really having trouble coming to grips with this bit:

In a pinch, a human pilot could just look out the windshield to confirm visually and directly that, no, the aircraft is not pitched up dangerously. That’s the ultimate check and should go directly to the pilot’s ultimate sovereignty. Unfortunately, the current implementation of MCAS denies that sovereignty. It denies the pilots the ability to respond to what’s before their own eyes.

I can understand the argument that yes, actually, the software does know better, shut up. But how on earth is this possibly a "minor change", not requiring re-certification and/or re-training of the pilots?
posted by RedOrGreen at 9:31 AM on April 23 [4 favorites]


there's a programmer out there who, because they didn't think about system redundancy in a life-safety situation

Maybe. But, more likely is that there is a team who were at the mercy of "project damagers" and upper management who ultimately green-lit a bad hardware design, and then attempted to solve it with software. Do you really think that those decision-makers would be any better at managing software development processes?

Next - I would also bet that there were time-critical deadlines the team had to meet, imposed not by the realities of building software, but by the release/marketing schedule for the product.

Why isn't management culture ever the culprit? Have to always throw the lowest person on the team under the bus, right?
posted by jkaczor at 9:36 AM on April 23 [26 favorites]


I can understand the argument that yes, actually, the software does know better, shut up. But how on earth is this possibly a "minor change", not requiring re-certification and/or re-training of the pilots?

There would be little perceived change in flying the plane because the MCAS software was going to work perfectly. As all software does.
posted by Tell Me No Lies at 9:39 AM on April 23 [6 favorites]


I can't think of any self-help book I'd recommend more in 21st Century America than Charles Perrow's Normal Accidents, for making sense of one's place and responsibilities in the complex, tightly-coupled systems that abound in our society. It is so much easier to operate in a failing bureaucracy when you can stop thinking in terms of individual blame, or at least be able to ask yourself in the midst of an ongoing disaster "is individual attribution and/or blame meaningful here?"
posted by lefty lucky cat at 9:41 AM on April 23 [12 favorites]


The author's conclusion, it seems to me, does not fly. Removing the MCAS leaves a plane that behaves catastrophically at large angles of attack and that would not be allowed without some remediation. At the same time there are already nearly 400 of these planes in the world and it's hard to believe they will be scrapped. What do you suppose happens next?
posted by sjswitzer at 9:41 AM on April 23 [1 favorite]


I've been following the discussion of these crashes on airliners.net. It's kind of disturbing that a sizeable minority of pilots are determined to put the crashes down to pilot error even though the evidence seems to suggest that the pilots followed the correct protocols and only abandoned them when they didn't work. Personally, I want to travel in an airplane that the worst pilot can fly safely.

Quite right. I am really surprised to hear that reaction from pilots! It reminds me of how my own industry (Nuclear Power) had to go through a culture shift (and is still shifting) from "we will only train / accept the best so that they can handle whatever situation" to "we will not create error-prone systems that put people in the situation where a mistake will kill people". But I still see people jump to blame the operator after accidents (just talk to someone about other drivers on the road). Ego is a pretty strong vaccine against fear I guess.
posted by Popular Ethics at 9:42 AM on April 23 [23 favorites]


Contemplating printing a banner of "we will not create error-prone systems that put people in the situation where a mistake will kill people" to wave at every cheerleader for vehicular autonomy who doesn't understand that this is not acceptable. Yes, I'm looking at you, Musk, with your promises yesterday of "we'll be driving autonomously in cities by next year."
posted by Making You Bored For Science at 9:46 AM on April 23 [11 favorites]


Elon Musk said that the company will someday allow drivers to select aggressive modes of its Autopilot driver assistance system that have a "slight chance of a fender bender."
Sorry, Making You Bored....
posted by the antecedent of that pronoun at 9:50 AM on April 23 [3 favorites]


Yes, I'm looking at you, Musk, with your promises yesterday of "we'll be driving autonomously in cities by next year."

Elon Musk says Tesla will allow aggressive Autopilot mode with ‘slight chance of a fender bender’
posted by clawsoon at 9:50 AM on April 23 [1 favorite]


This is a great article, and I'm really having trouble coming to grips with this bit:

In a pinch, a human pilot could just look out the windshield to confirm visually and directly that, no, the aircraft is not pitched up dangerously.


Actually this is the line that discouraged me about the article. A whole bunch of instruments and systems are in planes as many conditions looking out the window is worthless and dangerous. Deciding if a plane that size is approaching a dangerous stall is probably not physical/perceptual intuitive. Not having options for a pilot to "just fly the plane" is also a big problem. It seems like there was a long chain of decisions many of which may have been excellent in isolation both within the company and the airline community that lead to an unstable condition that software was not a sufficient solution. Answer: super smart AI (well no, i got nuth'n)
posted by sammyo at 9:51 AM on April 23 [2 favorites]


About three days after this came out I was on a 737 that had a very turbulent approach into Mérida, Mexico. We're talking lots of rattling and lots of sharp altitude-losing turns. I don't know planes but I know a lot about software and specifically how bugs propagate across similar platforms (although I didn't know about the MCAS); I'm sure a lot of other people on board weren't sure of the difference between the 737 and the 737 MAX at all. In any case the feeling of planes auguring in while the pilots sat helplessly was in the air.

Mérida is a very Catholic town, but I haven't seen that many people cross themselves simultaneously since I was at a Papal mass.


(it was later suggested to me that all the crossing could have a gyroscopic effect that would help stabilize the plane. Never thought of that.)
posted by Tell Me No Lies at 9:52 AM on April 23 [7 favorites]


My BIL is a software engineer who works on avionics displays. While he didn’t work on the MAX program, he has worked on other Boeing projects. As a result, he said he overheard several “very loud” conference calls back in December/January when Boeing knew they had a problem but before the EA crash, so they were trying to rush a “secret update” out to the fleet. While his company didn’t design or manufacture the MCAS, as the display provider “Boeing’s mistake is our problem”.

He expressed frustration that these crashes are due to bad I/O, which is an issue that the aircraft industry solved 40 years ago when computers started playing larger roles on the flight deck. These crashes were avoidable on several levels.
posted by Big Al 8000 at 9:53 AM on April 23 [10 favorites]


In related news.
posted by Big Al 8000 at 9:55 AM on April 23 [2 favorites]


The problem is, from Boeing's perspective, that without MCAS, the super-large engines, designed to meet customer fuel efficiency demands, affect the 737MAX's aerodynamics in a way the FAA wouldn't accept. The engines are so large they act as a lifting body of their own, and the entire system tends to point the nose up too much on power-up, beyond the point the FAA would consider acceptable.

Even worse, it has a tendency to nose-up more as you get closer to a stall which is not acceptable behaviour for a commercial airliner.

I've been following the discussion of these crashes on airliners.net. It's kind of disturbing that a sizeable minority of pilots are determined to put the crashes down to pilot error even though the evidence seems to suggest that the pilots followed the correct protocols and only abandoned them when they didn't work.

That was the attitude pilots had when checklists were introduced. In most safety sensitive industries, you will be laughed out of the room if you invoke operator error. It's a grossly inadequate response.

We have a junction near our house where, I was informed recently, "drivers keep having accidents", those aren't random accidents, that's a poorly designed junction where one of the approaches encourages high speed and is hard to see from the others and one of the most common traffic patterns requires weaving (a lane shift left to right in the same place as others have to shift lane right to left). The only reason no-one has died is that "side to side" accidents have good survivability. Traffic engineers know how to measure the riskiness of a junction and this junction needs to be re-designed. It drives me up the fucking wall when people say that "people just need to drive more carefully". I don't doubt that they do, but unless you've got a way of making them, let's re-design that junction just to be sure, yeah?
posted by atrazine at 9:57 AM on April 23 [36 favorites]


And air travel is still the safest it's ever been, you're much safer in a plane that crossing the road.

Contemplating printing a banner of "we will not create error-prone systems that put people in the situation where a mistake will kill people"

Exactly the problem with automation, perception. SDC's right now would save tens of thousands of lives and uncountable accidents but one exceptional situation will make headlines and terrify the masses.
posted by sammyo at 9:58 AM on April 23 [1 favorite]


As for self-driving cars, the very fact that so many pilots, as noted above, are determined to attribute the Boeing 737 Max crashes to pilot error is the real reason self-driving cars will be a hard sell... most people cannot tolerate the idea of a human fatality caused by a system accident, even if that system causes far fewer deaths than our current system of all-human drivers. Vehicle fatalities with human drivers are horrible, but blame is easily placed on the driver, or categories of drivers like drunk, tired, or inexperienced. It allows us to maintain the illusion that we are safe because we drive our own cars and we're better drivers. Take the driver out of the equation, and suddenly people are faced with vehicle fatalities in a system over which they have no control. One reason people hate flying... it's far safer objectively, but since you're not the pilot, you can't tell yourself the plane won't crash because you're an awesome pilot. You're at the mercy of a complex system and you know it.
posted by lefty lucky cat at 10:03 AM on April 23 [18 favorites]


. most people cannot tolerate the idea of a human fatality caused by a system accident, even if that system causes far fewer deaths than our current system of all-human drivers.

but the insurance biz isn't "most people"
posted by philip-random at 10:07 AM on April 23


the pilots might have needed an unreasonable amount of strength to fight the control surfaces 

Now I want to see a new kind of reassurance from the flight crew. "This is your captain speaking. As you can see from the windows on the left side of the plane I'm currently on the tarmac bench-pressing a motorcycle."

Apparently anything less should be considered pilot error waiting to happen.
posted by justsomebodythatyouusedtoknow at 10:14 AM on April 23 [11 favorites]


As for self-driving cars, the very fact that so many pilots, as noted above, are determined to attribute the Boeing 737 Max crashes to pilot error is the real reason self-driving cars will be a hard sell

I don't think so, or alternatively, the solution in the article is to give pilots more control - (ie, the systems should be able to be fully disabled so that the badasses can take over!) - but we have tons of data that the machines (especially flying machines but also cars) are operating at conditions that are really beyond full human abilities to do well.

I mean, if the answer is going to be full human control, then we need to go back to DC9s and cars that accelerate like molasses and with top speeds similar those on bikes coasting downhill.
posted by The_Vegetables at 10:17 AM on April 23


in my limited experience, it usually starts with some guy in sales who has no idea what the software team can/can't do, so he just assures the client, no problem, then hopes he's right.

This is even more literally true than you might imagine. Years ago I did some short term consulting work at Boeing and I was shocked that the literal big cheese salesman sometimes attended critical design reviews and scowled at engineers as they debated various critical safety issues.

The salesman had already inked a multi-billion dollar deal from which he himself would earn millions in bonus. He made his opinions quite clear about any design decisions that might jeopardize promises already made to customers and his bonus.

I quickly got out of that business because I couldn't stomach it, but I am not at all surprised that sales drove the decision to patch up a design flaw with software and then cover up those changes from pilots because of a sales promise of no new pilot training.
posted by JackFlash at 10:17 AM on April 23 [27 favorites]


clawsoon: Between that Verge piece and a piece CNBC put out yesterday, I'm trying not to froth at the mouth right now about how reckless Tesla's approach to vehicular autonomy is.

To bounce off of sammyo and left lucky cat's comments, respectively: the promise of vehicular autonomy is that it will save lives, if and when it's better than human drivers. Right now, the systems that are on the road might be better than human drivers, under some circumstances (and that's probably a generous interpretation; they're hopefully not too distinguishable from human drivers under these circumstances, so they're probably safe enough, but that doesn't mean they're better - what it probably means is that they have different failure modes).

Do I want autonomy that's better than human drivers? Absolutely. Far too many collisions are the result of human error, and if we can take those away (or even minimize their severity), we can make the roads safer for everyone. However, we're not even close to that point yet. For example, Tesla's self-driving mode is a L2 system, which means the driver is expected to keep their eyes on the road. That doesn't mean people use it like that (they treat it like an L3, thinking they don't need to look at the road, and that they can check their email / watch a movie / nap) while it's driving their car. Which almost certainly makes them less safe than other drivers on the road, not more safe, because if they need to take control because the vehicle doesn't detect something or misinterprets it, they probably won't have time to look at the scene, understand it and plan a response.
posted by Making You Bored For Science at 10:18 AM on April 23 [13 favorites]


Fantastic article. Also everyone who played putting the thing in the air and keeping it there should be in jail.
posted by Artw at 10:21 AM on April 23


I don't think this article mentioned it, but there are a couple of features associated with the MCAS system that were originally optional when purchasing a plane from Boeing (despite them being more or less safety features), but after the crashes started becoming available without cost.
posted by exogenous at 10:22 AM on April 23 [3 favorites]


It's kind of disturbing that a sizeable minority of pilots are determined to put the crashes down to pilot error even though the evidence seems to suggest that the pilots followed the correct protocols and only abandoned them when they didn't work.

Convincing yourself it was operator error is convincing yourself it can't happen to you, because you're of course perfect and a genius and would never make that error.

"Only an idiot would need [safety feature]" or "Only an idiot would have [done that easy to do thing that resulted in a tragedy]" is macho posturing that has basically never been true a single time it has been claimed.
posted by tocts at 10:24 AM on April 23 [21 favorites]


I come away from reading the article that MCAS can work, but in order for it to work, it also means significant retraining of pilots certified for the 737. And part of that retraining is that the 737 is no longer a 737, as those big engines can now act like rocket engines and produce additional lift, and the reason Boeing sold so many so quickly is that they pawned off the MAX as just a better 737, even though the addition of those engines and the software to handle the consequences of those engines means it is totally no longer a 737.

Or: money.
posted by linux at 10:49 AM on April 23 [4 favorites]


"Only an idiot would have [done that easy to do thing that resulted in a tragedy]" is macho posturing

Yeah, the first thing that popped into my mind reading night_train's comment was Chuck Yeager's autobiography in which he talks about the general macho culture among test pilots and the attitude that anyone who was killed was a "dumbass." It wouldn't surprise me a bit to learn that this attitude still prevails among pilots (though I have no way of confirming directly).
posted by nickmark at 10:52 AM on April 23 [2 favorites]


I've been following the discussion of these crashes on airliners.net. It's kind of disturbing that a sizeable minority of pilots are determined to put the crashes down to pilot error even though the evidence seems to suggest that the pilots followed the correct protocols and only abandoned them when they didn't work.

Hell, people still routinely blame people who get cancer for getting cancer.

This is an all too common response of people when they are confronted with realizing they might be helpless in a life and death situation.

Because the alternative is living in a state of continual sheer terror.
posted by srboisvert at 10:55 AM on April 23 [15 favorites]


SDC's right now would save tens of thousands of lives and uncountable accidents but one exceptional situation will make headlines and terrify the masses.

Self driving cars do not exist. There are cars capable of autonomy is some scripted or geo-fenced scenario, but there is no such thing as a general purpose self driving car. If I killed every billionaire in the world, and took all their money, I could not buy self driving car, because it doesn't exist as even a prototype.

The 737Max crashes were due to bad software that would trust a faulty sensor over the pilots who could look out the damned window and see that the plane was not stalling. It was essentially a Silicon pilot that decided the meat pilots could not be correct and then flew the planes into the ground to prove it.

That seems stupid.

That this misbehavior happened in software should be terrifying - imagine randomly that you get in your car, and the software update pushed out by Melon Husk Motors, Inc. just completely changed the handling of the car. We need a much stronger regulator schema to deal with idiot software pilots and also idiot software drivers.

So many regulations are written in the blood of the dead and the tears of their survivors. Why do we keep forgetting that?
posted by Pogo_Fuzzybutt at 11:05 AM on April 23 [15 favorites]


So many regulations are written in the blood of the dead and the tears of their survivors. Why do we keep forgetting that?

There's a rural highway not far from where I live. Everybody complains about all the traffic lights, and how there seems to be another set added every few years. According to a cop I know, when that highway was built forty odd years ago, there were no traffic lights. But every now and then, there's a fatal accident, followed by an inquest, followed by a new set of traffic lights.
posted by philip-random at 11:14 AM on April 23 [3 favorites]


If an 1800 hour test pilot can pull a lever too early, so can you. They now have an interlock system in place that knows better than the pilot when that lever should be pulled.
posted by rhamphorhynchus at 11:15 AM on April 23 [2 favorites]


That this misbehavior happened in software should be terrifying - imagine randomly that you get in your car, and the software update pushed out by Melon Husk Motors, Inc. just completely changed the handling of the car.

The worse issue is that this software exists in the first place because the MAX engines make the plane aerodynamically unsound. And Boeing, instead of making the investment to redesign the plane so that it would be sound, came up with this software hack to compensate for it and then didn't tell anyone that that's what they did.
posted by Autumnheart at 11:28 AM on April 23 [10 favorites]


Good god. So many red flags, not least of which was choosing to use non-union unskilled labor.

I’m an old guy who’s been in the software industry for 35 years.

Never!
Ever!
Trust a programmer who says they can safely pull off something like this!

And you’ll always find a programmer who’ll swear up and down they can do it safely. It is not possible to overestimate the hubris of a typical programmer. It’s an industry that could seriously use a large dose of imposter syndrome. And management pressures programmers to write code for risky scenarios like this because it seems cheaper and more profitable than hardware, or, y’know, just not doing it.

I used to think that rigorous testing and quality control could fix these issues in potentially life-threatening applications.

Nope. The majority of programmers can barely write code that does ANYTHING, let alone something as risky as this, and they’ll always find a way around your tests and QA/QC.

I rarely fly any more, partly because of this.

There are good software engineering organizations. The team who wrote the onboard systems for the space shuttle were a good example.

But those organizations are an extreme rarity, and nobody in the private sector will invest that kind of effort into writing software. They’re not the ones writing software for flight systems, health care, etc.

This xkcd cartoon is NOT funny because it is so depressingly true: https://xkcd.com/2030/
posted by shorstenbach at 11:34 AM on April 23 [26 favorites]


I want to push back on this a bit, I have read pretty strong opinions about this accident from 737 pilots who blame the aircrews and airlines for the two 737 max crashes and while I think they are missing a great deal I don't think they are incorrect as much as incomplete in how they view the accidents. The primary reason the pilots blame the aircrews is that the required response to an MCAS malfunction is the same as the response to runaway trim which is a critical non checklist skill that is trained for, (by non checklist I mean memorized.) I believe this was the reason Boeing could slide under the certification requirements without more simulator training or very disturbingly felt no need to tell pilots that MCAS existed. The Indonesian aircraft involved in the crash had the MCAS fail on the flight before the accident flight, disaster was averted because a jump seat passenger in the cockpit recognized the runaway trim condition and told the pilots to disengage the electric trim by flipping 2 switches on the center console. The broken plane was not fixed before being flown again or anyone telling the accident crew about the incident or the deficiency, this should tell you something about the management of the airline and how much of a safety culture they have.

The point the 737 pilots who blame it all on pilot error miss is that WE, the flying public don't really care to have airplanes carrying us that depend upon very well trained and experienced crews to handle inevitable problems brought about by lousy design in order to be flown safely. It seems obvious that the MCAS system, which seems like a perfectly reasonable approach to getting more life out of an airframe, was incredibly stupid in it's design and implementation.

That aside I don't think the author of this article entirely knows what he is talking about with his pronouncements about aviation "canon" and the autopilot in his Cessna. To me this is a complex situation that doesn't have just one cause. I think a lot of people do not understand this and are reassured by the idea that it is the airplane manufacturer that screwed up and the pilots and airlines are some kind of innocent victims as opposed to contributors to the accident.
posted by Pembquist at 11:42 AM on April 23 [5 favorites]


I'll just leave this here: Therac-25
posted by sfred at 11:53 AM on April 23 [18 favorites]


The primary reason the pilots blame the aircrews is that the required response to an MCAS malfunction is the same as the response to runaway trim which is a critical non checklist skill that is trained for, (by non checklist I mean memorized.)

Ars Technica on that:
One of the controls—the electric stabilizer trim thumbswitch on the pilot’s control yoke—could temporarily reset MCAS’ control over stabilizers. The Lion Air pilots hit this switch more than 24 times, buying them some time—but MCAS’ stall prevention software kicked in afterwards each time because of faulty data coming from the aircraft’s primary angle of attack sensor...

The Lion Air crew would have had to accomplish this while dealing with a host of alerts, including differences in other sensor data between the pilot and co-pilot positions that made it unclear what the aircraft’s altitude was. As a result, the crew continued to fight MCAS’ attempts to push the nose down until the end.
In the past, just pulling back on the stick cancelled the nose down from MCAS. Boeing changed that behavior with a software update, and didn't tell anyone. The Lion Air pilots had 60 seconds to figure it out.

The Ethiopian Air pilots did disable the MCAS system:
The pilots of the Ethiopian Airlines flight did flip the cutout switches, and they cranked the controls to attempt to regain positive stabilizer control. But they continued to have difficulty controlling the aircraft.

It is not clear at this point whether the pilots purposely reactivated the MCAS’ stabilizer control or if the software reactivated on its own after shutdown. While a Wall Street Journal source said that it appeared the pilots turned the system back on in hopes of regaining control over the stabilizers, Reuters reports that the software may have reactivated without human intervention, and further investigations of that possibility are ongoing.

Shitty software kills.
posted by Pogo_Fuzzybutt at 11:55 AM on April 23 [34 favorites]


Early on at cisco marketing was running a two page ad featuring a quote from someone in the Canadian healthcare system: "For life-critical situations, we only trust Cisco."

Over in engineering we all had a good laugh and reminded ourselves never to get sick in Canada.
posted by Tell Me No Lies at 12:11 PM on April 23 [4 favorites]


The 737Max crashes were due to bad software that would trust a faulty sensor over the pilots who could look out the damned window and see that the plane was not stalling. It was essentially a Silicon pilot that decided the meat pilots could not be correct and then flew the planes into the ground to prove it.

That seems stupid.


Mmm, yeah, seems that way but it's not so simple. Most problems are human error. Wasn't one of the crashes in recent years because a pilot was instinctively pitching up away from a crash, prolonging a stall? And everyone was telling him to pitch down the whole way, but he wouldn't listen? What if the flight computer could have forced him to do the right thing against his will?

I guess it's a philosophical question of sorts. If the computer can save a lot of lives but sometimes kills a few when it's wrong, is that an overall positive? Not if you're one of the few, that's for sure.
posted by ctmf at 12:12 PM on April 23


That's the same kind of Fight Club logic where, if it costs less money to pay settlements than to do a recall, you don't do a recall. It's basically rolling the dice on whether the number of people killed will significantly devalue the company. (I'm not accusing you of suggesting that it's an appropriate way of making a decision, btw, just underlining that it's a crazy way to run a company.)
posted by Autumnheart at 12:18 PM on April 23 [2 favorites]


as those big engines can now act like rocket engines and produce additional lift,

I don’t think that’s entirely accurate. While, yes, the new LEAP engines generate about double the thrust of a 1968-spec 737-100, they only generate about 9% more thrust than the 737-800/900 engines.

The more critical issue is fan diameter. When the 737 was originally designed, high-bypass turbofans weren’t a thing. As a result, the original fan diameter was only 48”. Now, the fan diameter of the LEAP engine is 88”, which is 10” higher than the previous generation.

This is a problem because there is only so much room between the wing and the ground. This is because, again, when the 737 was originally designed, it was common to deplane using built-in stairs at the door. As a result, the 737 was designed to sit close to the runway because that stair could only be so long. Nowadays, even small airports use a jetway to keep passengers out of the elements, making plane height less of an issue.

If they could have kept the engine thrust under the wing like in previous generations, they wouldn’t need the MCAS.
posted by Big Al 8000 at 12:18 PM on April 23 [3 favorites]


the pilots and airlines are some kind of innocent victims as opposed to contributors to the accident.

The problem with this is that pilot community aren't engineers. Their view is biased. From a scientist or engineer's view, the fact that two accidents happened is really enough to show this wasn't mere pilot error. But pilots cannot see this because their role biases them.
posted by polymodus at 12:20 PM on April 23 [1 favorite]


If the computer can save a lot of lives but sometimes kills a few when it's wrong, is that an overall positive?

Obviously it's a false choice anyway when you haven't done the homework to make the chance of computer error as small as possible, which seems to be the case with Boeing. Cross-checking and validating inputs and giving the pilot a disable switch for when they KNOW they are right seem like no-brainer bare minimums.
posted by ctmf at 12:23 PM on April 23 [1 favorite]


The author's multiple decades of experience as both a software developer and a pilot lend valuable perspective. Notably, though, he doesn't seem to have ever worked in aviation software. I wonder how many of the decisionmakers in software development at Boeing (et al.) have a similar combined SWE/aviation background? In particular, I'm curious if any Mefites have insight on [whether/how/how much] "problem domain" knowledge is deliberately incorporated into granular software development decisionmaking in these kinds of life-and-death fields.

From looking at ads for Boeing and avionics-related software engineering gigs (e.g.), it seems like they generally follow the familiar format of "must have [n years of] experience using languages X, following development processes Y, and developing applications of type Z." So a developer for Boeing or a subcontractor may be expected e.g. to know their way around a particular kind of avionic databus, but there's no expectation that they've ever flown a plane. That's pretty much of a piece with general software practice these days, AFAICT -- but when software crosses the line from just happening to be implemented on a plane to actually flying the plane, it seems like hiring a software team none of whom know how to fly a plane would be a lot like hiring a pilot who doesn't know how to fly a plane.

If there are other ways that pilots' voices are effectively incorporated in the dev process, the devs' personal experience might not matter -- but to be anything more than window dressing, anyone in a "user advocate" or "subject matter expert" role needs to be on the same level of technical knowledge as the developers in order to advocate effectively for or against a particular tradeoff or test-case implementation. Do companies like Boeing actually invest in hiring/training people with the necessary combined experience in aviation and SWE?
posted by shenderson at 12:27 PM on April 23 [6 favorites]


It's kind of disturbing that a sizeable minority of pilots are determined to put the crashes down to pilot error even though the evidence seems to suggest that the pilots followed the correct protocols and only abandoned them when they didn't work.

i think it's both wanting to believe they wouldn't fall victim to it, and also the fact that both airlines had pilots that had a bit more melanin/are from developing countries.

as far as the recklessness of trying to go with trying to automate everything to cut out humans, well, STET (previously) speaks to it a little, but even then it's not talking about slipshod thinking like, "ship it and we'll fix it on production/in post".
posted by anem0ne at 12:28 PM on April 23 [3 favorites]


Now, the fan diameter of the LEAP engine is 88”, which is 10” higher than the previous generation.

The way I read it, not only is the thrust axis higher now relative to the center of gravity, but there's an aerodynamic lift effect. As the wing pitches up, air is approaching from lower than directly in front. Now that the engines are sticking out so far in front of the wing, the air catching on the engine housings can generate a lift force forward of the normal center of lift, causing a pitching moment on the wing, increasing the pitch-up force.

I'm not an aerodynamic engineer though. Am I reading it wrong?
posted by ctmf at 12:30 PM on April 23 [4 favorites]


even though the evidence seems to suggest that the pilots followed the correct protocols and only abandoned them when they didn't work.

the pilot I mentioned above felt the problem was beyond following correct protocols -- it was that pilots are currently getting certified who have a basic lack of auto-pilot free flying hours. Such that many of them are not up to "correct protocols". When something bad does happen, they lack confidence in their "free flying" skills and it becomes way too easy to panic.

and also the fact that both airlines had pilots that had a bit more melanin/are from developing countries.

he also added that the crashes probably wouldn't have happened in North America, because here you simply don't get to fly big jets unless you have documented "free flying" skills (ie: you've logged a certain number of instrument free hours).
posted by philip-random at 12:49 PM on April 23 [1 favorite]


Wasn't one of the crashes in recent years because a pilot was instinctively pitching up away from a crash, prolonging a stall? And everyone was telling him to pitch down the whole way, but he wouldn't listen? What if the flight computer could have forced him to do the right thing against his will?

You're thinking of Air France 447. But software was a contributing factor to that crash as well, and I think it's worth pondering both the similarities and differences. In AF447, software that helps control the plane was presented with faulty sensor input. Apparently unlike the MCAS, though, it noticed the discrepancy, but then it still did something relatively dramatic: it disengaged itself and put the plane in Alternate Law, giving the pilots more direct control over all the control surfaces. I think it's inconclusive whether the pilots noticed and understood this change. Other software interface issues confounded everything, like the handling of multiple inputs from the pilot and copilot, and alarm prioritization.

The software can't work if it doesn't have reliable input data. In the case of MCAS, we see what happens if it is insufficiently cautious and uses bad data. In AF447, we see the risks of software detecting bad data and suddenly giving up completely, especially when the pilots are worried about other aspects of the flight like weather.

I think people remember the pilot pulling up on the stick of AF447 and causing the stall because it's the most dramatic and "basic" error that happened. But I think it would be a tragedy to not consider all the factors, including software design considerations, that led him to think, on some level, that pulling up was a safe and appropriate course of action.

(I'm a software developer with no aviation experience whatsoever. I defer to others with more experience if I've made any mistakes in this post.)
posted by brett at 12:52 PM on April 23 [8 favorites]


a sizeable minority of pilots are determined to put the crashes down to pilot error
I certainly don't think pilots have a higher supply of wishful thinking than the rest of us, but perhaps the particular spectacle of danger creates more demand for wishful thinking?

"When I was collecting the data in the spring of 1944, the chance of a [bomber] crew reaching the end of a thirty-operation tour was about 25 percent. The illusion that experience would help them to survive was essential to their morale. After all, they could see in every squadron a few revered and experienced old-timer crews who had completed one tour and had volunteered to return for a second tour. It was obvious to everyone that the old-timers survived because they were more skillful. Nobody wanted to believe that the old-timers survived only because they were lucky." - Freeman Dyson
posted by roystgnr at 1:05 PM on April 23 [15 favorites]


The pilots of the Ethiopian Airlines flight did flip the cutout switches, and they cranked the controls to attempt to regain positive stabilizer control. But they continued to have difficulty controlling the aircraft.

Everyone should really read the article BungaDunga linked.

It seems that the Ethiopian pilots followed the Boeing procedure correctly by deactivating the electric trim. But there is a second rare condition in which that alone would not save them.

You can have a condition of runaway trim that can be very difficult or impossible to recover from. What happens is that the stabilizer is trimmed for full nose down and to counteract that the pilot uses the elevator controls for full nose up. It turns out that the aerodynamic forces of these two actions are additive. Both cause enormous force on the stabilizer jackscrew. This means that even after you disconnect the electric trim, you cannot physically move the manual trim wheels.

This seems to be what happened to the Ethiopian pilots. They disconnected the electric trim as in the recommended Boeing procedure and then tried to use manual trim which was locked up. The Boeing recovery procedure failed. So, thinking the manual trim was broken, they re-engaged the electric trim as a last ditch effort and the aircraft crashed.

It turns out that this stabilizer lockup was a known possibility in older versions of the 737 and pilots were trained in the procedure to recovery from it, but that procedure was no longer taught as they thought that in could no longer occur. But with the 737MAX and MCAS, that possible condition reappeared, but pilots were not trained for it.

The proper recovery from the stabilizer lockup condition is to do the exact opposite of every pilot instinct. Even though you are plunging in a steep dive you have to release your pressure on the elevator controls trying to keep the nose up. This reduces the pressure on the stabilizer jackscrew just enough that you can turn the manual trim wheel a bit. Then you pull back again to level the airplane briefly and repeat the procedure until you get the trim under control. This is called the "rollercoaster" technique but it only works if you have enough altitude to work with because each rollercoaster dip takes you closer to the ground.

You can think of this as the way you reel in a big fish on your rod and reel. You can't just crank in the fish because there is too much pressure on the reel crank. So you briefly dip the tip of the rod and quickly reel in the slack line under reduced pressure. You repeat this dip and crank procedure until you bring the fish in.

So it seems the Ethiopian pilots did exactly as trained by Boeing, deactivating the electric trim, but that was not enough. They should have been further trained on the secondary rollercoaster procedure if the first did not work.

To see this in action in a flight simulator, watch this video, particularly starting at about 10:00 minutes in. What they have created is a situation in which the faulty MCAS has created a full nose down trim. Notice the pilot on the left has wrapped both arms around the control column and is pulling back with all his strength to counteract the nose down trim, trying to keep the airplane level. When he does this, it puts so much stress on the stabilizer jackscrew that the pilot on the right cannot crank the trim wheel. The only way out of this is for the pilot on the left to deliberately let go of the controls and allow the plane to go into a dive to relieve the pressure on the stabilizer and allow the trim wheel to be turned. Without any training in this procedure, it goes against every instinct to let go of the controls. And it only works if you have enough altitude to play with.

Here you can see a closeup of the 737 stabilizer jackscrew in action. The scale here is impressive. That jackscrew is about 2 1/2 inches in diameter. The stabilizer the jackscrew is connected to is an airfoil that is about 48 feet wide by 12 deep. Think of the force generated by that wing at 600 mph like your hand hanging out the window of a car.

By the way, I'm amazed at the fidelity of the flight simulator. It faithfully simulated the extreme aerodynamic feedback on the control column and on the trim wheels. Maybe the MCAS software engineers should have spent a little time in the flight simulator. Oops, that's $1200 per hour, not going to happen.
posted by JackFlash at 1:24 PM on April 23 [36 favorites]


pilots are currently getting certified who have a basic lack of auto-pilot free flying hours.

I feel like that's kind of simplistic, too. A traditional airplane is inherently stable such that if you just take your hands off, it tries to recover itself. It's like trying to keep a marble in the bottom of a bowl. It stays there, until the pilot intentionally pushes it to somewhere else to achieve a desired effect, then it wants to go back.

More and more, high-performance aircraft are getting so complex that it's become keeping a marble on top of an inverted bowl. The pilot would spend all their time keeping the marble where it is. So you put in a computer that resists and counteracts all forces, you just tell the computer where to put the marble. Then the computer fails.

At some point "free-flying skills" are not going to work. I don't think you can free-fly a fighter jet anymore, and a dynamically unstable airliner is getting into that territory.
posted by ctmf at 1:32 PM on April 23 [2 favorites]


I'll just leave this here: Therac-25

I used to work for AECL, so seeing that example in my 4th year ethics class was sobering. I wonder if the 737-max will be the new example of negligent code management causing death in next year's textbooks? Has any kind of inquiry been launched that could give us the full story to learn from?
posted by Popular Ethics at 1:33 PM on April 23


It's all human error. People trying to say the software was working correctly and it was the pilots' fault are just trying to direct attention away from the programmers' human error.
posted by Automocar at 1:46 PM on April 23 [4 favorites]


The notion that developing world pilot training is a technology assimilation problem is not good optics; it's reductive. Blaming pilot error is a way to deny this global structure: it says the first world doesn't have to bear responsibility for the techno imperialism it inflicts and exploits on others. It's like WEIRD psychology--the assumptions holding for one group but not for another yet powers that be turning a blind eye to difference, not having to confront this power dynamic.
posted by polymodus at 1:52 PM on April 23


Every damn problem eventually comes down to a software problem. That's where everyone tries to fix all their damn problems.
posted by rhamphorhynchus at 2:08 PM on April 23 [1 favorite]


I'd really like to thank everyone in this thread. Your cogent comments and clarity have given me a much fuller picture of this than I ever would have gotten just by reading news article.

Not knowing much other than what I read in this thread, I think the problem was the engine redesign which was not well thought through to begin with and everything they did since that point was simply trying to patch up a bad first decision.
posted by hippybear at 2:16 PM on April 23 [5 favorites]


Early on at cisco marketing was running a two page ad featuring a quote from someone in the Canadian healthcare system: "For life-critical situations, we only trust Cisco."

Over in engineering we all had a good laugh and reminded ourselves never to get sick in Canada.


You were all already critically ill with engineer's disease.
posted by srboisvert at 2:28 PM on April 23 [5 favorites]


The author's multiple decades of experience as both a software developer and a pilot lend valuable perspective. Notably, though, he doesn't seem to have ever worked in aviation software.

Is (civil) aviation software more like the US military code that's all written in ADA to specifications that fill a wall of ring binders, or more like the Toyota engine controller with 30,000 global variables?
posted by acb at 2:38 PM on April 23 [2 favorites]


I don't particularly want a programmer to go to jail for this. I want them to be charged for the crime it is, though. And probably acquitted because they were simply performing the duties they were assigned, according to industry best practices.

And I want that "best practices" bullshit to become a huge fucking deal around the world. I want programmers, and other engineers, to stop treating their work like some kind of "omg you pay me to have fun" game. I want people to take their shit seriously when lives will be on the line. I want flight software to be gone over with a damn microscope, like the hardware is. But this has to start somewhere.

What I really want is for programmers who work in life-critical industries to have to be certified for the practice, and insured, and to have to constantly prove themselves to their peers that they're not fucking up.

Babe, I've been a programmer since 1977 and I take all of my coding deadly serious. Nothing gets me more professionally outraged than engineers shirking the depths of their duties. Joe Code doesn't need to burn in hell, the whole system does. But I think the best way to start software reform is for Joe Code to get on the stand and have his hot-shit lawyer say "this is just the way programming works" and put a dozen cross-disciplinary experts on the stand saying "yep" and for there to be the world's biggest collective gasp.

And some changes to be made.
posted by seanmpuckett at 2:43 PM on April 23 [10 favorites]


I think this particular paragraph bears more attention:

In my Cessna, humans still win a battle of the wills every time. That used to be a design philosophy of every Boeing aircraft, as well, and one they used against their archrival Airbus, which had a different philosophy. But it seems that with the 737 Max, Boeing has changed philosophies about human/machine interaction as quietly as they’ve changed their aircraft operating manuals.

If we take the Toyota philosophy of "ask why 5 times", I think it would not take too many whys to point to the homogenization of corporate America, administered by the homogenized class of MBAs that runs the show. It's not just us worker bees who are on notice that a job is just a gig. The managerial class also thinks that way, and they think they too are just as interchangeable. So in corporate America, and also in Academia, we have these Tom and Daisy Buchanans who come in for a 3 year gig, accomplish a bullet point for their resume, and leave behind all sorts of messes for others to clean up.

When the final report comes in, we'll find out that the MBA who decided to solve the problem in software was new to the industry, and had brought in software engineers to solve something entirely different in a prior gig, something business critical but not life critical.

For the same reason, we'll find another resume polishhing MBA to be the reason John Deere decided to use software to lock farmers out of the combine harvesters they buy, so that farmers now download firmware hacks from Ukrainian colleagues.

We'll also find that this homogenization is a lot more national in character than what came before. John Deere is a generic American corporation, with all the shit that entails. And that is why Mahindra is making inroads in American farm country. Mahindra is staffed with IIT gratuates, not MBAs, and the Mahindra/Tata/Corporate-India managerial class just hasn't internalized the same notions, so they're not going to produce non-field-serviceable diesel engines any time soon.

Circling back to aviation, we have Airbus to compare against. Airbus went into software early, had some harrowing close calls, and they got bailed out by the French government going big into software quality assurance through formal verification. Corporate Europe has its own homogenous consensus on how to do things, which for this niche is a much better fit.

So TLDR: nuke Harvard Business School.
posted by ocschwar at 2:55 PM on April 23 [34 favorites]


What I really want is for programmers who work in life-critical industries to have to be certified for the practice, and insured, and to have to constantly prove themselves to their peers that they're not fucking up.

Not sure how this could work.

Programmers often use code previously written by other programmers, for applications that were not considered life-critical. Programmers may never be aware where their code eventually is used.

The same goes for physical widgets designed for multiple industrial uses. The designers and engineers may never know where these ultimately get used.

This is where regulators and inspectors come in, to review a final product. During the design process, why didn't the DER think that having an oversize engine that produces its own lift a bad idea? The three strikes noted in the article identify 3 critical decision points where the decision was a bad one.
posted by linux at 2:55 PM on April 23 [5 favorites]


It would work the same way it works in other branches of engineering.

If a particular deliverable affects the safety of human beings, then it can only be completed under the supervision of a Professional Engineer, who signs on the plans knowing that if he signs when he shouldn't, he can go to jail, and if anything goes wrong he can lose his license.

Aviation Mefits can tell whether the Boeing liason (DER they called it) is criminally liable a priori the way PEs are. If not, then that needs fixing forthwith.

There already are software PEs. It's not as prevalent as in civil engineering, for obvious reasons.
posted by ocschwar at 3:01 PM on April 23 [5 favorites]


Not knowing much other than what I read in this thread, I think the problem was the engine redesign which was not well thought through to begin with and everything they did since that point was simply trying to patch up a bad first decision.

The decision goes even farther back. The first 737 flew more than half a century ago, long before most of the people currently working on it were even born. The engineers have been pushing for a new design from scratch for a couple of decades, but the bean counters said that shareholders would rebel at the engineering cost and told them to keep milking the ancient design by bolting on ever more exotic modifications. The current 737s have different wings, different fuselage, different engines, different landing gear and different flight controls but the FAA allows them to call it the same airplane within certain parameters.

They even transitioned to a glass cockpit in which they dutifully draw analog gauges on the LCD screens mimicking the ones from 50 years ago so they can have pilots interchangeable without additional training.
posted by JackFlash at 3:05 PM on April 23 [8 favorites]


Programmers often use code previously written by other programmers,

Then they shouldn't do that unless they can sign their name and guarantee with their professional reputation and license that it's safe. It's more expensive, but it's not impossible.

On preview, what ocschwar says.
posted by ctmf at 3:06 PM on April 23 [4 favorites]


Then they shouldn't do that unless they can sign their name and guarantee with their professional reputation and license.

What is this license you speak of?
posted by JackFlash at 3:08 PM on April 23 [1 favorite]


Exactly.
posted by ctmf at 3:09 PM on April 23 [5 favorites]


The license is what enables an engineer to tell his manager "I am more afraid of losing my license and/or my freedom than I am of losing my job. And if you fire me, my replacement will be legally obliged to seek me out and ask about any safety issues, and I will be legally obliged to be candid."
posted by ocschwar at 3:13 PM on April 23 [9 favorites]


If a particular deliverable affects the safety of human beings, then it can only be completed under the supervision of a Professional Engineer, who signs on the plans knowing that if he signs when he shouldn't, he can go to jail, and if anything goes wrong he can lose his license.

Okay, so ultimately there is someone, who supervises and therefore is licensed, that signs off. We're not talking about every single programmer, then. That sounds feasible.
posted by linux at 3:13 PM on April 23


>>
>> Over in engineering we all had a good laugh and
>> reminded ourselves never to get sick in Canada.
>
> You were all already critically ill with engineer's disease.
>

On the contrary, we knew our limitations and planned accordingly. At that point in cisco's history pumping out new features was prioritized over the massive rewrite it would take to make the system stable. We had no illusions.
posted by Tell Me No Lies at 3:14 PM on April 23 [3 favorites]


There already are software PEs. It's not as prevalent as in civil engineering, for obvious reasons.

So not prevalent that I have never met one. Perhaps they exist, but it would be hard to tell. I have never heard of a job or project where one would be required, so I don't know why anyone would bother with that certification unless they were trying to leverage it into a consulting career as an expert witness in litigation. Even then, I don't think it would carry as much weight as an advanced academic degree.
posted by JackFlash at 3:25 PM on April 23 [1 favorite]


I'll just leave this here: Therac-25

Yeah that's another thing I would tell my engineer friend about and he simply would not believe me since I guess I wasn't an engineer.
posted by Pembquist at 4:20 PM on April 23


And I want that "best practices" bullshit to become a huge fucking deal around the world. I want programmers, and other engineers, to stop treating their work like some kind of "omg you pay me to have fun" game.

Who exactly are you talking about here?

The vast majority of software being written today won’t kill anyone if it fails. Applying the same rules to the code that runs, say, Netflix as you would to the code running an airliner isn’t just pointless, it’s counterproductive. (And yes, I know about Netflix and chaos engineering.)
posted by asterix at 4:32 PM on April 23 [4 favorites]


The vast majority of software being written today won’t kill anyone if it fails

Well, yes, but careless code can expose users to any number of vulnerabilities, some of which might make them wish they were dead.

Not everything is as critical as an aviation, nuclear, or medical control system, and I don't seriously mean to equate life-critical software with other applications... but there is a dire need for more discipline in software engineering.

(My opening bid is that anyone starting a new project should seriously consider a safety-conscious language such as Rust or Go. Or even Java [JVM-family languages] or JavaScript depending on context. Choosing C or C++ by default should be, and soon will be, considered malpractice. None of this is responsive to the issue in this thread, however. You could still write a poorly-designed avionics system in Rust. The broader issue is that we need to take software more seriously than we do.)
posted by sjswitzer at 5:14 PM on April 23 [1 favorite]


According to my father, who was the bombadier in a Liberator bomber during WWII, at that time the pilots attributed all lost planes that were not a direct hit on the cockpit to pilot error. Every now and again some miracle would occur and a plane would manage to land minus half a wing and half an undercarriage or some such, which caused the pilots to believe that any plane that still had half a wing and half an undercarriage was airworthy and if it wasn't landed successfully it was entirely the result of pilot error. This was a tenet of faith with them.

These pilots went on to become the civilian pilots in the airlines doing commercial aviation after the war. They brought their belief that all crashes are the result of pilot error with them. I am told that this belief was taught to later generations of pilot, and while the engineers argued against it, the airlines encouraged it. That way when a plane had a fault and landed hard, the pilots were willing to accept that the hard landing was their fault and that they could have figured out a way to land it that would not have been a hard landing. This made them better employees. It was apparently accepted by pilots in the eighties that anything was survivable so long as the pilot didn't panic and do something stupid. Whether this culture and its beliefs lasted beyond the eighties I don't know.
posted by Jane the Brown at 5:35 PM on April 23 [14 favorites]


Thanks for JackFlash's manual trim comment. I had read the Ethiopian 302 report when it came out, and I didn't understand why they had turned the electric trim back on instead of trimming manually.

One detail of the Alaska Flight 261 crash (where the stabilizer trim jack screw failed due to lack of maintenance) was that after the jack screw had completely disconnected, they tried inverting the jet, and even reported initial success in getting out of the dive. It didn't save them because the stabilizer wasn't jammed nose-down, it was loose.
posted by netowl at 5:40 PM on April 23


I want to respond to these two points, but first understand I am not arguing "stupid pilots"

One of the controls—the electric stabilizer trim thumbswitch on the pilot’s control yoke—could temporarily reset MCAS’ control over stabilizers. The Lion Air pilots hit this switch more than 24 times, buying them some time—but MCAS’ stall prevention software kicked in afterwards each time because of faulty data coming from the aircraft’s primary angle of attack sensor...

The pilots of the Ethiopian Airlines flight did flip the cutout switches, and they cranked the controls to attempt to regain positive stabilizer control. But they continued to have difficulty controlling the aircraft.


In the first case attempting to use the thumb switch when the electric trim runs away is not the correct procedure, the correct procedure is to turn off the two switches and leave them off for the rest of the flight. It feels a little stupid to discuss this as probably most people don't really know what is meant by "trim" but on the 737 the trim motors drive two large handwheels that are located between the pilots, the wheels are painted alternately white and black so that when they are whirring around it is easy to see. You can trim the airplane by turning these wheels by hand, this is what you do when you have turned off the electric trim. Many pilots find the actions of the Indonesia pilots a little inexplicable, if the pilot recognizes that the electric trim is behaving in an uncommanded fashion the appropriate response is to turn it off not fool around with it.

With regard to the Air Ethiopia crash the power setting was set to high, probably because of the distraction caused by the MCAS failure, and it seems that at that speed the pilots were unable to manually trim without the use of an unusual technique,(yoke relaxed trim pull back on yoke repeat,) and so may have turned on the trim motors to try to use them which caused the MCAS to feed in even more nose down trim.

My point is not that the pilots were incompetent but that the way they responded to the crisis was. Whether that is due to a deficiency in training and an over reliance on automation or on the contrary because the MCAS failure is too overwhelming I have no real idea. What these accidents do show is that the aircraft as designed depended on a high level of skill and experience to be safe, obviously too high a level. This does not make it the aircrews fault but it does make them a link in the accident chain and the deficiency they brought to the table is important as they are in fact the real users of the equipment not some hypothetical super pilot.

It is easy to dismiss the opinions of 737 pilots as self serving denial but seems like it is in itself a kind of denial that in fact there are large differences in the quality of airlines around the world. My understanding is that Air Indonesia has a pretty lousy history while Ethiopian Airlines is the best carrier on the continent.
posted by Pembquist at 5:54 PM on April 23 [6 favorites]


One last thing and then I will shut up. To me these accidents reek of the usual normalization of deviance whereby a safety margin is built in to all aspects of a design so inevitably people start cutting into that margin heedlessly because they figure there is plenty of cushion until suddenly there isn't. If one of the safety margins is based on having high time recurrently trained competent pilots you will lose that safety margin if you start depending on those pilot's expertise except as a last resort.
posted by Pembquist at 6:04 PM on April 23 [5 favorites]


This does not make it the aircrews fault but it does make them a link in the accident chain

I think everyone agrees on that. The disagreements are whether you say that we can prevent this ever happening again by deciding that all pilots must in future be better than these two were, or whether you say that if two pilots did it, we should assume that any pilot might do it.
posted by the agents of KAOS at 6:12 PM on April 23 [3 favorites]


At least in the Ethiopian situation, the pilots did follow the Boeing recommended procedure and deactivate the electric trim. But because of stabilizer runaway, they couldn't turn the manual trim wheels. Boeing had removed the "rollercoaster maneuver" from their training manual because they no longer thought that was necessary. However, the MCAS reintroduced a fault condition that they did not train pilots for. That was not pilot error. That was Boeing's error, again, driven by their sales promise that no new training was required for the 737MAX.
posted by JackFlash at 6:32 PM on April 23 [8 favorites]



There already are software PEs. It's not as prevalent as in civil engineering, for obvious reasons.

So not prevalent that I have never met one. Perhaps they exist, but it would be hard to tell.


Oh, I know. I'm an EE who wound up programming for a living for his whole career. 10 years ago, seeing where the electric power sector was going with the smart grid, I went and took the Fundamentals of Engineering exam. Passed it, so in MA I'm an Engineer in Training. But to get the PE, I need to work for a PE and that duck has not cared to line itself in a row, especially in software.

It takes software PEs to make software PEs, so there aren't many. And there is also not much demand for them. Your average DonMartinSoundEffect.com doesn't need PEs to oversee code writing. A lot of projects that probably should get a PE sign off don't. And for the ones that do, well,, civil engineering PEs command a far smaller salary. You hire one, and they design a system to completely bypass whatever software your project or product involves, so that there is a manual way to bring a system to.a safe failed mode. For most cases, it is the only solution, or at least the best one.

Aviation, ironically, produces the most possible contexts where safety needs to be done in software. (Case in point: A flying aircraft might lose cabin air quality or buffet the pilots to the point they are incapacitated. At which point a fully autonomous emergency landing is needed. This has already happened with fighter jets, and at least one case the plane did land autonomously. On a carrier. With a tail hook and arresting cable.) And even the fully manual bypass we're talking about here, isn't exactly. A 737's flight surfaces exert forces way beyond the muscle power of a pilot mechanically linked to a control. So you need power assisted hydraulics, designed to give the pilot tactile feedback. If that tactile feedback is miscalibrated, you still lose control of the aircraft. SO you're back to having to not design a plane's aerodynamics to be so unstable in the first place.


I'll just leave this here: Therac-25

Yeah that's another thing I would tell my engineer friend about and he simply would not believe me since I guess I wasn't an engineer.


The difference between a programmer and an engineer is that an engineer would know about the Therac 25 incident, quite frankly.

This is why I'm irked with the comments above on how engineer's disease is a contributing cause here. The pathologically narrow focus that defines engineer's disease is not the problem. It's the solution. Hire people who focus like that on safety, and you get safety. (Possibly at the cost of your company going under, but well, it's only money.) No, the problem will more likely be the kinds of people we have managing companies nowadays. There was someone who though he could manage the 737 MAX MCAS project because he did 22 sprints as scrum master at DonMartinSoundEffect.com and none of the bro-grammars quit because he brought beer to code freeze and got the office new beanbag chairs. And because he has an MBA and not.a PE license, he will in no way be held liable, and will have a nice cushy midlevel job elsewhere as soon as all is said and done.
posted by ocschwar at 6:57 PM on April 23 [20 favorites]


The pathologically narrow focus that defines engineer's disease is not the problem. It's the solution.

"Engineer's disease", I thought, was thinking that, since you are really competent in one subfield of technical endeavor, everything else in the world is just like doing that. Thinking that, since you are experienced at writing, say, embedded systems for media players, avionics can be easily mastered with just common sense.
posted by thelonius at 7:07 PM on April 23 [6 favorites]


Ah. Ok. That's hardly limited to engineers, though.
posted by ocschwar at 7:09 PM on April 23 [2 favorites]


That's hardly limited to engineers, though.

That may be so, but that's what they call it. It's perhaps more apt to think of it in the context of believing that social or political or economic problems can be simply resolved with the methodologies and thinking habits of the sufferer's technical specialty.
posted by thelonius at 7:25 PM on April 23 [2 favorites]


For all y'all espousing that software needs PEs -- is there an appropriate curriculum somewhere that would actually be the first step in this training? It seems like the other engineering fields have a pretty solid handle on what works, what doesn't, and how to build stuff without it breaking/killing people. Software ... doesn't seem that mature.

(My own coursework is a decade out of date, but I'm familiar with the curriculum of two leading CS departments and they were notably weak on anything *practical*. If you want theory, they were fantastic, but if you want to become a better engineer, go apprentice yourself at an internship or three.)
posted by Metasyntactic at 7:37 PM on April 23 [1 favorite]



For all y'all espousing that software needs PEs -- is there an appropriate curriculum somewhere that would actually be the first step in this training?


First there's the test preparation for the FE exam, which is in large part an intro to civil engineering for people who never took any Civ E class. Any deep dive into the safety issues in any branch of engineering is good. And for that matter, any class onFMEA in any other branch: chemical engineering, electrical engineering, et cetera. Civil engineers have to learn a lot of things to do their jobs. Many know C and Matlab, including all about the bugs in the products they use, because if your bridge goes down, you don't get to blame it on a bug in your finite element modeling package. You should have known about the bug when you used the product to model the stresses on your trusses. Software engineers have an obligation to look beyond their branch and be humbled too.

Then one should study FMEA in a software engineering context. Nancy Leveson's book Safeware is the literal textbook on it.

THEN there's also the computer science curriculum on formal verification and modeling. It's abstruse and theoretical, but it's also the means by which software engineers avoid being the subject of Professor Leveson's next published paper. Formal methods are at this point mature. France pulled Airbus's chestnuts out of the fire by funding work into Coq and related projects in INRIA, which is why you might hear. a lot of French spoken in programming language conferences nowadays.

Put all that together and you can apply it day to day it with 1. listen to the people who can tell you what the system is meant to do in meatspace. 2. model what you're about to design with TLA+/Alloy/something similar 3. use Rust so that you can know at compile time that your code has no memory management errors, concurrency errors, or network protocol deadlocks, and that frees your time to look at how your code interacts with the real world.
posted by ocschwar at 8:08 PM on April 23 [7 favorites]


we need to go back to cars that accelerate like molasses and with top speeds similar those on bikes coasting downhill.

You say this as if it would be a bad thing.
posted by carter at 8:14 PM on April 23


This is the very interesting article about the Air France crash.
posted by jeather at 8:19 PM on April 23


is there an appropriate curriculum somewhere that would actually be the first step in this training?

Well, first of all you have to pass the Fundamentals of Engineering exam to become an Engineer in Training. This is a general science knowledge exam that you typically take your senior year in college while the topics are fresh in your mind. Topics include math, probability and statistics, chemistry, physics, electricity, thermodynamics, etc.

Then you need 8 years of experience working under the supervision of a software PE. Typically graduation from an ABET certified engineering school can substitute for 4 of those 8 years. Good luck with the other 4 years. Do you know any jobs where there is a software PE on staff? It takes a PE to make a PE. Kind of like vampires, you wonder how they ever got the first one.

Then you take the 8 hour Principles and Practice of Engineering Exam in your specific field to become a certified PE. Note that most states will not allow you to take this exam until you provide written documentation and signatures of your 4 to 8 years of experience under a software PE, so this is probably the biggest hurdle.
posted by JackFlash at 8:23 PM on April 23 [5 favorites]


A big fat metaphor is the reason why this news-story has such legs.
posted by Fupped Duck at 8:57 PM on April 23 [1 favorite]


This thread seems to have a lot of spleen-venting when we don't really have any information about the process inside Boeing that produced MCAS or the 737MAX in general. We know the outcome, but the process that produced it is—so far—pretty opaque. I think it will probably require some sort of Rogers Commission Report to really figure out what happened. Pity Feynman isn't around anymore; I'm not sure who I'd want on a modern version of the same.

And yes we all know that everyone hates project managers and they're just the worst yadda yadda. Everyone hates project managers when something gets fucked up, but project management as a discipline is how you get anything more complex than the B-24—because what we now call 'project management' (as a set of practices and techniques) was developed for the purpose of producing complex systems like those in the B-29, and formalized in the postwar era (much of modern project management comes from the early ICBM programs and then Mercury/Gemini/Apollo). If we could have simply gone on managing projects like everyone did in the 1930s, we would have. Put differently: if project management didn't exist as a discipline, the first time someone wanted to do something complex and large-scale, they'd end up reinventing it. (I have a lot of Strong Opinions about the appropriateness of applying Apollo Program level project management processes to every goddamn rinky-dink corporate project, but that's for another day.)

I'm not much of a betting man, but if I was going to bet, I'd say that the solution likely to emerge is to deactivate MCAS (or make it easy to entirely deactivate) and then require pilots to undergo simulator training. This is exactly what the airlines basically insisted Boeing not do, but now that the airlines are potentially out many millions of dollars in otherwise-unflyable aircraft, they're in less of a position to argue. Change the conditions, and suddenly previously unfeasible solutions become possible.

But the idea that you're ever going to be able to attribute a large-scale failure to a single person is silly; if you see that happening, dollars to donuts somebody is getting scapegoated, and I'd hesitate to believe it and look harder. There were almost certainly multiple groups of people working at length to talk themselves into what in retrospect was a bad idea, but—and this is where we have to try pretty hard not to oversimplify things in retrospect—bad ideas are always obvious after they've failed. They are not obviously bad ideas before that, or else they wouldn't get implemented. (The really dangerous ideas lurk in the no-mans-land between "brilliant" and "stupid".) The net of this is that stuff like formal verification, while certainly not a bad thing, won't catch everything—and it won't surprise me if it turns out it wouldn't have caught the 737MAX issue.

So many regulations are written in the blood of the dead and the tears of their survivors. Why do we keep forgetting that?

There is a saying that "NATOPS is written in blood"—it's not like the processes for landing a fighter jet on a moving ship were just dreamed up a priori by some eggheads in a conference room. A bunch of people ended up dead figuring it out the hard way. But if you look hard enough at basically any discipline, it's almost always this way. We build bridges a certain way, because at one point people built them in other ways and they fell down. Early cathedral construction was a... similarly messy business. Early steam engines and boilers were ridiculously dangerous; they only got better when insurance companies stepped in because they'd had enough of paying for the damage. I find it disheartening that so many disciplines sanitize their own history by leaving out, intentionally or through ignorance, the reasons why things are done a certain way.

As for why "software engineering" is the way it is... IMO it's because demand outstripped supply. As JackFlash notes, it takes a PE to make a PE. It's a long process, and there's not a ton of motivation for software engineers to go through it—seeking out the rare opportunity where it might even be possible, for starters, and then staying there long enough to see it through—when there are plenty of places they can work that don't care one way or the other, and are happy enough to hire a CS grad just as readily as a CompEng (hell, a lot of recruiters don't know the difference anyway—and some CompEng programs are so milquetoast on the actual 'engineering' that I'm not sure they functionally are different much of the time). If we want more software to be built or supervised by PEs, which is a noble goal, we need to figure out the pipeline problem and fix the supply side, not just manufacture a bunch of demand through regulation. The latter route is just going to lead to one PE trying to 'supervise' a hundred developers and you'll have a burnout rate on par with physicians, another field where we have a similar problem.
posted by Kadin2048 at 9:11 PM on April 23 [7 favorites]


Ah, what a nice plane, the fit and finish, the AC is so quiet and smooth even at the gate. Ergonomic seats; with a little room to spare; even in coach. Take off flight and landing is as if the engines are electric wonders that defy physics.

Oh. I'm on an Airbus. :/

Can't forget all the drunks that would come to work; fifteen minutes into a shift; pull a five gallon bucket out from under their bench, pop the lid off; and then proceed to vomit. And then go back to making airplanes/jets.
Guys that would tumble into work so tanked they could barely stand; clock in, stumble back out to their cars, go home. And then come back and clock out. Working the line in America.

American aviation is where our cars were in the late 70's and 80's - a great deal of clunk and junk still. Typical Boeing product still reminds me more of a Briggs lawnmower than anything that takes flight, I'm surprised this hasn't happened more often.
posted by Afghan Stan at 9:46 PM on April 23 [1 favorite]


The latter route is just going to lead to one PE trying to 'supervise' a hundred developers and you'll have a burnout rate on par with physicians

Actually there is no burnout involved. A PE doesn't actually have to "supervise" the trainees. You just need to have someone with a PE on staff who will sign your paperwork saying you worked there for at least four years in your chosen discipline. He doesn't even have to know your name. Turns out it is just a paperwork formality.

For example, there are over 20,000 engineers at Boeing. Anyone who wants to go through the exam can just find a PE somewhere in the company to sign them off. Pretty easy for an electrical or mechanical engineer. Not so easy to find a software PE. But very engineers bother with it because none of the documents or drawings need to be signed by a PE. Not required for aviation certification.
posted by JackFlash at 9:52 PM on April 23


That's hardly limited to engineers, though.

That may be so, but that's what they call it. It's perhaps more apt to think of it in the context of believing that social or political or economic problems can be simply resolved with the methodologies and thinking habits of the sufferer's technical specialty.


The root of this is what Wigner called the "Unreasonable Effectiveness of Mathematics in the Physical Sciences". In Physics and certain engineering discipline, the application of mathematical tools are so effective that they create a dopamine-rush fuelled feeling of extreme competence. I think programming computers gives an even stronger feeling of omnipotence, it is very easy to find yourself feeling like Boris Grishenko, yelling about your invincibility right before you get frozen by liquid nitrogen which wasn't part of the problem space you had considered.

That's also the reason that most biologists and field scientists aren't affected, nothing about the experience of being an biochemist or a geologist will make you feel that you have god-like powers over the external world.
posted by atrazine at 3:28 AM on April 24 [23 favorites]


That the pilots made the wrong call is obvious and irrelevant. On another day they might have made the right call. Most pilots on most days would probably make the right call. The problem is that MCAS had a single point of failure. When that failure occurred the pilots were required to make a split second decision on which their lives and their passengers lives depended. I don't care how good you are, at that point you're depending to some extent on luck to make the right call.
posted by night_train at 3:58 AM on April 24 [4 favorites]


Any pilot's decision in-the-moment is preceded by decades of inter-linked decisions that led to the current system that the pilot is interacting with at that point. In the case of the 737 MAX these decisions date back to the 1960s. These prior system design decisions could have been good decisions or bad decisions, and driven by multiple factors (safety, profit, etc.). So part of the task is trying to figure out which of these decisions led to these tragic crashes, and also if it is possible to address/reverse these. The MAXs are currently on the deck until mid-August so we will see.
posted by carter at 4:19 AM on April 24


Then you take the 8 hour Principles and Practice of Engineering Exam in your specific field to become a certified PE. Note that most states will not allow you to take this exam until you provide written documentation and signatures of your 4 to 8 years of experience under a software PE, so this is probably the biggest hurdle.

NCEES discontinuing PE Software Engineering exam
posted by octothorpe at 4:29 AM on April 24 [2 favorites]




NCEES discontinuing PE Software Engineering exam

Only 81 people have ever taken the PE software exam in its short history and they don't say how many passed. No wonder I've never run into one. It's a completely useless certification since it is not a requirement for any job or project.
posted by JackFlash at 7:25 AM on April 24 [1 favorite]


That's hardly limited to engineers, though.

very true. One of the most obvious walking-talking examples of Engineers Disease that I'm aware of is Jordan Peterson. His academic expertise is in clinical psychology but that hasn't stopped him from weighing in expert-like on pretty much every damned thing.
posted by philip-random at 7:52 AM on April 24 [5 favorites]


One of the most obvious walking-talking examples of Engineers Disease that I'm aware of is Jordan Peterson.

Someone can surely do a Jordan Peterson parody in which the 737 Max is on a hero's journey.
posted by clawsoon at 8:02 AM on April 24 [4 favorites]


Only 81 people have ever taken the PE software exam in its short history and they don't say how many passed. No wonder I've never run into one. It's a completely useless certification since it is not a requirement for any job or project.

There are lots of EE PEs coding for a living. I have and had no intention of taking the software PE, and I don't think a software PE is necessary to assure the safety of a software engineering design. TO make software safe, you secure the interface between your product and human flesh, in hardware, by assuming that the software will be shit and working from that as an axiom.

That way you save lives.

Then, and only then, you start working on the software itself and its quality criteria. And that way you save money and keep your company profitable.
posted by ocschwar at 8:08 AM on April 24 [2 favorites]


Lordy, we’re blaming engineers? MANAGEMENT.
posted by Artw at 8:36 AM on April 24 [8 favorites]


Oh. I'm on an Airbus. :/

Whew!
posted by Tell Me No Lies at 8:47 AM on April 24 [1 favorite]


When Airbus had the first software incident, lots of engineers joked "if it ain't Boeing, I ain't going."
Boeing's cautious approach to software implementation earned it a lot of trust. All of it gone now.
posted by ocschwar at 9:09 AM on April 24 [1 favorite]


Lordy, we’re blaming engineers? MANAGEMENT.


Where I work, we have what we call an 'engineering-led organization'. I used to think that means that of the 'big 3', Engineering manager, production manager, and safety manager, engineering gets a bigger vote than everyone else.

That's true, but it also means that every senior leadership position is filled by someone with an engineering background. Not a MBA, not a lawyer or politician, not a sales guy, etc. This sucks for me because it means I (not an engineer) can only rise so far, but it's also nice for me. I don't spend a lot of time arguing about things that are technically stupid, I can just elevate it to senior management and they all say "yeah, that's stupid."

What I argue about is the tendency of engineers to remove safety margin from everything because they're so confident in their calculators they forget about Murphy's Law, but that conflict is built in and we're good at dealing with it. We have our problems, but haven't had a major reactor accident in over 60 years, so.

I'm not sure what my point is, other than I can't imagine this organization making that MCAS system without someone pointing out that it probably should cross-check other detectors, and winning that argument. Or that re-training the pilots is the right thing to do, so we're just going to have to tell the customer that.
posted by ctmf at 9:11 AM on April 24 [3 favorites]


Or that re-training the pilots is the right thing to do, so we're just going to have to tell the customer that.

Couple of problems with that. First, telling customers that re-training is necessary implies admitting that there is a potential safety issue. Salesmen are reluctant to do that. As soon as Airbus got wind of Boeing's potential safety issue they would crank up their own sales machine pointing this out to customers.

Second, entire multi-billion dollar sales to many airlines were based on a promise of no new pilot certifications being required to fly the new planes. That is the way you lock your customers in to prevent them from switching to Airbus. Violating that promise might mean renegotiating billion dollar contracts.

Re-training pilots is a very big deal for airlines. Simulator time goes for about $1200 an hour. You have to pull your pilots out of their normal schedule to fit in new training hours. And requiring new certifications to fly the new planes adds an enormous new dimension to the complex matrix of scheduling flights and crews. You have to make sure that certain pilots are in certain locations to fly certain airplanes, at least until you get every pilot re-trained. Southwest Airlines alone has over 9,000 pilots. It might take them years to get all their pilots re-certified.

So the sales folks were putting enormous pressure on engineers to just patch this problem up with software and then keep it to themselves.

This is a problem across the software industry. Sales people making promises to customers, money changing hands, before finding out from engineers whether their promises are even feasible.
posted by JackFlash at 10:38 AM on April 24 [2 favorites]


Lordy, we’re blaming engineers? MANAGEMENT.

This is so sweet it's almost touching but I see a lot of it in this thread. The "engineers" didn't do it, some nebulous force called "management" made them do it. Those managers probably even have... ugh... MBAs. Well there we go, we've found the culprit. Harvard Business School downed those planes.

The majority of Boeing's management, all the way from the supervisory level guys with a team of a few people up to the top are people with engineering degrees who may have spent decades working as engineers before reaching management positions. What this is, just like the space shuttle disasters, is poor engineering decision making. Some of the people making those decisions were engineers, some were management, many were both (it's not a dichotomy). Deciding that "management" did it is a meaningless tautology because anyone who can make the final decision on something like that is a "manager" by definition so it gets you no closer to an answer.
posted by atrazine at 10:44 AM on April 24 [3 favorites]




This is so sweet it's almost touching but I see a lot of it in this thread. The "engineers" didn't do it, some nebulous force called "management" made them do it.


There were no engineers involved. Only programmers. Programmers are interchangeable. If they don't do as instructed, you can replace them.

Engineers, however, have to perform due diligence on safety. That means speaking to predecessors. So firing one for balking about safety is a tad counterproductive.

And the reason this shitshow was done by programmers rather than engineers: managers.
posted by ocschwar at 11:07 AM on April 24 [5 favorites]


This is the real difference between a programmer and an engineer. An engineer can stand her ground. Credentials and licensure can make the distinction clear, or they can obscure the distinction, but they do not define it.
posted by ocschwar at 11:09 AM on April 24 [4 favorites]


This is the real difference between a programmer and an engineer. An engineer can stand her ground. Credentials and licensure can make the distinction clear, or they can obscure the distinction, but they do not define it.

There is no licensing or credentials. At Boeing or any place else, there is no distinction between "engineers" and "programmers." They are all called engineers. Their official job title is "Engineer." Their pay scales are all officially "Engineer."
posted by JackFlash at 11:18 AM on April 24 [1 favorite]


Salesmen are reluctant to do that.

Naturally. What I'm saying is that here, I think engineering would win that battle. Also that that battle would never happen because the head of Sales would be an engineer, and not allow his people to promise something Engineering didn't agree with.
posted by ctmf at 11:23 AM on April 24 [1 favorite]


because what we now call 'project management' (as a set of practices and techniques) was developed for the purpose of producing complex systems like those in the B-29, and formalized in the postwar era (much of modern project management comes from the early ICBM programs and then Mercury/Gemini/Apollo)

And that would be great, if organizations continued to build on known/tested project management practices, rather than rely on "woo of the week" cult fads that may have no fit or purpose being applied within their industry sector. (Admittedly, I don't know what Boeing uses internally, but...)
posted by jkaczor at 12:50 PM on April 24


There is no licensing or credentials. At Boeing or any place else, there is no distinction between "engineers" and "programmers." They are all called engineers. Their official job title is "Engineer." Their pay scales are all officially "Engineer."

Well - a job role title is not the same as a professional accreditation.

Myself - I was a "Premier Field Engineer" for several years - it was fun when we were dispatched to jurisdictions where government actually recognized the term "engineer" as accredited and licensed (i.e. Quebec and Oregon (although things may have changed recently)) - we were not allowed to use our normal job title.yeah,
posted by jkaczor at 1:04 PM on April 24 [1 favorite]


Second, entire multi-billion dollar sales to many airlines were based on a promise of no new pilot certifications being required to fly the new planes. That is the way you lock your customers in to prevent them from switching to Airbus. Violating that promise might mean renegotiating billion dollar contracts.

Good lord this fact pisses me off. Leave it to capitalism to end up with the perverse requirement that you essentially cannot ever (significantly) change airplane design from a decades-old origin point. I guess we're just stuck with an increasingly complicated pile of bandaids and chewing gum crammed into an 737-shaped carcass, forever. That seems like a perfectly good idea.
posted by axiom at 1:13 PM on April 24 [4 favorites]


> an increasingly complicated pile of bandaids and chewing gum crammed into an 737-shaped carcass

It's not even shaped like a 737 any more! Those engines change the shape and center of lift enough that they came up with the MCAS software bandaid to cover it up. So maybe just "approximately 737-shaped".
posted by RedOrGreen at 1:56 PM on April 24


There were no engineers involved. Only programmers. Programmers are interchangeable. If they don't do as instructed, you can replace them.

The engineers who placed much larger engines on the 737 body, creating the issue in the first place, what's their role in this?
posted by romanb at 4:38 PM on April 24 [7 favorites]


"Programmers are interchangeable."

Programmers may not have the same requirements as engineers, but they are so far from fungible.
posted by flaterik at 5:24 PM on April 24 [1 favorite]


Putting this out there: the people who made the money from the decisions to go ahead with the hacky solutions should do the crime.
posted by Artw at 6:32 AM on April 25 [2 favorites]


There were no engineers involved. Only programmers. Programmers are interchangeable. If they don't do as instructed, you can replace them.

The CEO of Boeing is an aerospace engineer. The CEO of Boeing Commercial Airplanes is a materials engineer. The CTO is a electrical engineer with a PhD in systems science, formerly an adjunct professor of same. The VP of "Digital Transformation" within Commercial Airplanes, who was presumably ultimately responsible for MCAS, is an aerospace engineer. VP of Commercial Airplanes Total Quality is a chemical engineer, formerly SVP for Manufacturing Operations for Toyota. VP of the "New Mid-Market Airplane" program is dual-degreed with masters in both aeronautical engineering (RPI) and materials engineering (MIT). VP and GM of the 737 program—MechE, worked his way up from a "liaison engineer" position on the factory floor in 1985. VP/GM Fabrication—MechE who started her career in R&D. VP of Manufacturing, Safety and Quality for commercial aircraft—Industrial Eng. (and also from Toyota). VP of Supply Chain—MechE bachelors, Mechanical and Aerospace Eng. masters. Hell, even the VP of Sales & Marketing has a masters in Aerospace Eng and started as an aerodynamics engineer.

Whatever their problem is, it's not a lack of engineers in management roles.

Though the BCA division's General Counsel was a bio major, and the VP Communications has a degree in journalism.
posted by Kadin2048 at 7:55 AM on April 25 [6 favorites]


And yet we’re still talking C level types making management decisions. Main thing an engineering background can do for them is give them engineers disease as they decide that making a frankenbird that barely flies isn’t a bad idea.
posted by Artw at 8:25 AM on April 25


Ah, so if they don't have engineers in the C-suite, then it's a case of the pencilnecks and MBAs running roughshod over the technical people. But if they do have engineers in senior positions, then they have engineers disease.

Got it.
posted by Kadin2048 at 8:55 AM on April 25 [5 favorites]


Someone can surely do a Jordan Peterson parody in which the 737 Max is on a hero's journey.

"You may say, 'Well, Therac-25 don't exist.' It's, like, yes they do - the category 'Therac-25' and the category '737 Max' are the same category. It absolutely exists. It's a superordinate category. It exists absolutely more than anything else. In fact, it really exists."
posted by aspersioncast at 1:07 PM on April 25 [4 favorites]


Ah, so if they don't have engineers in the C-suite, then it's a case of the pencilnecks and MBAs running roughshod over the technical people. But if they do have engineers in senior positions, then they have engineers disease.

No, I have to concede I may be wrong here, especially seeing that Delaney, whose bio still includes the 737 MAX program, has a degree specifically in aerospace engineering.
posted by ocschwar at 3:30 PM on April 25 [1 favorite]


This article is engaging and very well written, so well written in fact that it makes me - a total idiot who knows nothing about aviation engineering - feel like I have a grasp of what went wrong with the complex systems of the 737 Max and the multi-disciplinary development of them, after just a 10 minute read. So naturally I'm hesitant to believe a single word of it.

In particular I can't believe that the developers of MCAS made it rely on a single sensor because it simply never occurred to them to use multiple inputs. Just speculating, but it seems more likely that someone (mostly likely not a software engineer but a system design engineer or someone like that) made the deliberate decision that the risks associated with the extra complexity of integrating multiple inputs outweighed the risk of relying on a single sensor. You know, the KISS principle that the author refers to as "an aviation canon from the day the Wright brothers first flew at Kitty Hawk" even though the first line of the linked article says it originates from the US Navy in the 1960s. Obviously, someone got it wrong and hopefully in the fullness of time some Feynman of the day will tell us exactly how. In the meantime, as tempting as it is for the layperson to convince themselves that they are smarter than a techbro Boeing engineer because we know about the Apollo computer voting system and stuff, I'll reserve judgment.
posted by L.P. Hatecraft at 2:11 AM on April 26 [3 favorites]


KISS as official policy dates to the navy in the 1960s.

KISS as a canon dates to the very first flight. Wilbur stayed airborne for 10 seconds even though he had lift to stay far longer, because the steering controls they tried out were difficult and unstable. (Control flaps in front, something the Wrights immediately abandoned from their designs.)
posted by ocschwar at 11:11 AM on April 26






Source: Boeing whistleblowers report 737 Max problems to FAA
The FAA tells CNN it received the four hotline submissions on April 5, and it may be opening up an entirely new investigative angle into what went wrong in the crashes of two Boeing 737 Max commercial airliners -- Lion Air flight 620 in October and Ethiopian Air flight 302 in March.

Among the complaints is a previously unreported issue involving damage to the wiring of the angle of attack sensor by a foreign object, according to the source.

Boeing has reportedly had previous issues with foreign object debris in its manufacturing process; The New York Times reported metal shavings were found near wiring of Boeing 787 Dreamliner planes, and the Air Force stopped deliveries of the Boeing KC-46 tanker after foreign object debris was found in some of the planes coming off the production line.

Other reports by the whistleblowers involve concerns about the MCAS control cut-out switches, which disengage the MCAS software, according to the source.
posted by clawsoon at 8:54 AM on April 28




Whatever their problem is, it's not a lack of engineers in management roles.
And they should all lose their licenses >:(
posted by Popular Ethics at 2:01 PM on April 30


Boeing relied on single sensor for 737 Max that had been flagged 216 times to FAA (CNN, autoplay video):
The device linked to the Boeing 737 Max software that has been scrutinized after two deadly crashes was previously flagged in more than 200 incident reports submitted to the Federal Aviation Administration, but Boeing did not flight test a scenario in which it malfunctioned, CNN has learned.
posted by clawsoon at 5:22 AM on May 1 [1 favorite]


btw - just checking the tags here - it would be useful to add "normalaccident" and "perrow" to thisw post ...
posted by carter at 11:49 AM on May 9


« Older Do you really want to know?   |   "a chill of arctic iciness down the spines of the... Newer »


This thread has been archived and is closed to new comments