Cannibal AI and pancakes
July 27, 2018 1:29 AM   Subscribe

This spreadsheet is a collaborative list of strange machine-learning behaviour discovered when the systems exploits a bug or a lack of behavioural constraints. It includes indolent cannibals - AI agents which bred offspring as a free energy source - and a pancake making robot that optimised for pancake throwing height. It is simultaneously wonderful and deeply worrying.
posted by secretdark (56 comments total) 52 users marked this as a favorite
 
Legit laughed out loud. Amazing. I was going to quote some of my favourites here, but really, they're all good.
posted by tickingclock at 1:58 AM on July 27 [1 favorite]


I don't worry so much about what happens after we have true artificial general intelligence. I'm a lot more worried about what happens in the gap between "AI" and "AGI", when it's smart enough to be dangerous but not smart enough to know what it's really doing.
posted by Punkey at 2:47 AM on July 27 [2 favorites]


"Genetic debugging algorithm GenProg, evaluated by comparing the program's output to target output stored in text files, learns to delete the target output files and get the program to output nothing."

Sounds like they've succeeded in creating some of my junior developers.
posted by garius at 2:50 AM on July 27 [26 favorites]


TLDR AI is just like the genie is the lamp in that it will try to screw you over by being a total grammar nazi or shitty loser.


Seriously, this is the first time I've been concerned about AI and it has nothing to do with AI being evil and everything with it being a black box with enough intelligence to cause weird, potentially dangerous, unwanted and unpredictable behavior.
posted by Foci for Analysis at 3:15 AM on July 27 [8 favorites]


lots of variations on the conclusion that WOPR came to: 'the only winning move is not to play'
posted by mohiam at 3:39 AM on July 27 [5 favorites]


Some of these seem a bit daft. Rewarding a football robot for touching the ball, one point per touch? What do you think it would do?

(Still, ta for the laughs)
posted by pompomtom at 3:45 AM on July 27 [1 favorite]


I.... I think I've coached all of these children.
posted by mce at 3:57 AM on July 27 [6 favorites]


Some of these seem a bit daft. Rewarding a football robot for touching the ball, one point per touch? What do you think it would do?

When people get these results, they're thinking "Oh, duh. That was dumb of us" not "Look how devious our thing is". Of course, I have had many, many conversations with product managers who want me to build some system with an obvious exploit that a cynical customer will find.

This one, though, seems to be "we made a basic error": Neural nets evolved to classify edible and poisonous mushrooms took advantage of the data being presented in alternating order, and didn't actually learn any features of the input images
posted by hoyland at 4:16 AM on July 27 [6 favorites]


The best devious one was the one that used steganography to hide results in the output.

EMP that one, ASAP.
posted by pompomtom at 4:21 AM on July 27 [6 favorites]


TLDR AI is just like the genie is the lamp in that it will try to screw you over by being a total grammar nazi or shitty loser.

Rick & Morty: Keep Summer Safe
posted by snuffleupagus at 4:25 AM on July 27 [10 favorites]


These are often presented in an anthropomorphizing kind of way that is funny but also misleading. The reality is, AI — like any computer program — does only and exactly what people tell it to do. It’s very literal and it knows only the data you feed it. It will find any way it can to solve for the optimum condition you’ve set it, because that is literally what it does — it solves a math problem. It doesn’t know anything but “get this number as big or small as possible, by manipulating these specified parameters.” It’s on the developers to specify any other constraints they want it to fulfill. Like the pancake making robot in the spreadsheet: it was told to maximize the time the pancakes were away from the ground, so it did, by throwing the pancakes up really high. It wasn’t told anything about where the pancakes should be if not on the ground. The creator later revised the formulation of the optimal condition to include keeping the pancake in the pan, and got better results.

It’s also on the developers to feed it training data that actually teaches it something relevant — think “face recognition” that couldn’t recognize people with darker skin because the engineers involved had never thought to train or test the system with anyone who wasn’t light-skinned. The AI wasn’t racist; the developers were.

So, as always, the real danger is human beings — our own laziness, stupidity, and prejudices cause us to make AI that does unexpected or harmful things.
posted by snowmentality at 4:54 AM on July 27 [12 favorites]


The number of agents that exploited bugs in physics simulations makes me think there's a lot of future in "AIs" that are really good at video game speedrunning.
posted by EndsOfInvention at 5:08 AM on July 27 [4 favorites]


A lot of this sounds like the sort of experiments children do if they're given the chance.
posted by wolpfack at 5:10 AM on July 27 [2 favorites]


A programmer I know describes his job as “working with the most literal genie you can imagine”
posted by DoctorFedora at 5:11 AM on July 27 [12 favorites]


It’s very literal and it knows only the data you feed it. It will find any way it can to solve for the optimum condition you’ve set it, because that is literally what it does — it solves a math problem.

It’s also on the developers to feed it training data that actually teaches it something relevant — think “face recognition” that couldn’t recognize people with darker skin because the engineers involved had never thought to train or test the system with anyone who wasn’t light-skinned. The AI wasn’t racist; the developers were.

I guess what confuses me with examples like this is, if everything the AI is doing has to be programmed in advance, and its racism was an oversight instead of baked in, then how did it come to care about skin tone? It sounds like with an elaborate enough learning AI, the system proposes its own new math problems and checks to see if solving that problem produces better results.

Is it that we're not setting up the problems right, or that our own errors or prejudices lead us to provide an incomplete environment or corrupted feedback as the system learns?

Some of the physics examples sound like more of a problem with the programming of the non-AI objects the AI interacts with than the AI itself. Like a wall-hack in a multiplayer game being exploited by a human player. Or the Destiny loot cave. Are those examples of AI run amok, or just standard bugs from object-oriented programming?
posted by snuffleupagus at 5:16 AM on July 27


Agent pauses the game indefinitely to avoid losing

I'm suddenly concerned I was unknowingly used as a template for this AI, so close the resemblance of its method to my own.
posted by gusottertrout at 5:23 AM on July 27 [6 favorites]


Nah, that’s based on my mate that I used to play chess with.
posted by pompomtom at 6:05 AM on July 27 [2 favorites]


This is the way the world ends
Not with a bang
But with a shortcut
Taken by an AI
Optimizing an algorithm
posted by kokaku at 6:33 AM on July 27 [19 favorites]


The hazard of AI isn’t that it will be better than humans (we’re not going to get there for a long time) but that enough slick IBM/Raytheon consultants can convince bobble heads with money that their “AI” is a better solution than people. Shitty black-box computer programs are a method to dodge accountability and justify huge contracts to decision-makers who don’t know enough to tell when they’re being fed bullshit.
posted by cirgue at 6:36 AM on July 27 [6 favorites]


2025: AI programmed to maximize human happiness blankets Earth with smilex gas.

No, seriously, this is a scenario Nick Bostram worries about.
posted by justsomebodythatyouusedtoknow at 6:42 AM on July 27 [1 favorite]


Is it that we're not setting up the problems right, or that our own errors or prejudices lead us to provide an incomplete environment or corrupted feedback as the system learns?

A bit of both. In a biased world, it's very hard to get unbiased training data, no matter how diligent you are. But people's own biases mean they may fail to realise the way their data collection methodology or model design may reinforce those biases.
posted by hoyland at 6:44 AM on July 27 [1 favorite]


Meetup had a good (simple) example, where they realised that using gender as a feature meant they never recommended engineering meetups to women because the people who went were all men. Take it out, you start recommending them to women and lo and behold, women go to engineering meetups if you tell them about them. But someone has to say "Something's weird here. Are we just reinforcing an existing bias?"
posted by hoyland at 6:47 AM on July 27 [17 favorites]


Rick & Morty: Keep Summer Safe

suffleupagus, I feel like every thing on this spreadsheet could be or already has been a Rick & Morty episode
posted by jrishel at 6:48 AM on July 27


...pauses the game indefinitely to avoid losing....Nah, that’s based on my mate that I used to play chess with.

This is literally why they invented the chess clock
posted by thelonius at 6:55 AM on July 27 [3 favorites]


Agent kills itself at the end of level 1 to avoid losing in level 2
Turing test passed.
posted by biogeo at 6:56 AM on July 27 [10 favorites]


The lesson is that the “racism”, just like the “pancake”, wasn’t ever present in the algorithm at all.

Like, ok, sure, this algorithm doesn’t open the door for black folks, and that’s obviously a super racist thing to do. Except what happens when I take the exact same algorithm and plug it into my predictive policing system? Refusing to tag people of color as criminal suspects interacts with our existing social structures in a really, really, really different way. It completely changes the semantics of the program. Nothing called “intelligence” should behave that way just because you moved its labels around.

We need to interrogate the utility and effect of the algorithm, in context, and reject the attempt to load it up with semantics that aren’t purely mathematical. When you see someone beating minorities with a Certified Anti-Racist® Smart Truncheon, you don’t say “I think your anti-racist truncheon might actually have encoded some of the racial biases of its creators.” You say, “Your dumbass truncheon is not in any sense ‘smart’, you racist asshole.”
posted by emmalemma at 7:00 AM on July 27 [3 favorites]


Speaking of chess, there's an (apocryphal?) story about SARGON, an early PC chess program, getting checkmated when it had a pawn on the 7th rank - and promoting the pawn to a new King.
posted by thelonius at 7:06 AM on July 27 [5 favorites]


The AI wasn’t racist; the developers were.

This seems crucial to teach the public. It seems too easy to convince everyone that extremely harmful effects are just a big ol' whoopsie-do, instead of the result of a neglectful process carried out by humans with blind spots or willful biases.

It also shows just how dire the lack of diversity in tech has gotten. So many people's well-being will depend on whether a Silicon Valley company bothered to hire anyone from marginalized groups.
posted by NickDouglas at 7:13 AM on July 27


The hazard of AI isn’t that it will be better than humans (we’re not going to get there for a long time) but that enough slick IBM/Raytheon consultants can convince bobble heads with money that their “AI” is a better solution than people.

Brawndo drops to zero
posted by snuffleupagus at 7:17 AM on July 27


The AI wasn’t racist; the developers were.

The AI was racist because the developers were.
posted by callmejay at 7:19 AM on July 27 [6 favorites]


Whether or not an AI is racist doesn't need to have to deal with an AI's "intent". The question is whether or not you have a system (that encompasses the program, the programmer, the subjects, and so on) that is racist. Just like racist policies don't need intentional, mustache-twirling Grand Dragons creating them in order to be racist.

Anyway, I'm now more convinced than ever that the first huge AI disaster is going to be a financial crime created by an AI finding an exploit of some sort, or doing something that should be a crime, but isn't illegal *yet.*

Or the latter will be the argument of the AI trader's programmers, who thought that they could hard-code law into their system, neglecting that the law can account for end effects without intent, balancing tests that can't be hard-coded, and assessment of local norms.

I mean, people are going to be arrested, have bail revoked, or deported first, based on bad AI, but that won't be a big scandal first.
posted by pykrete jungle at 7:45 AM on July 27 [5 favorites]


Or the latter will be the argument of the AI trader's programmers, who thought that they could hard-code law into their system, neglecting that the law can account for end effects without intent, balancing tests that can't be hard-coded, and assessment of local norms.

Let me introduce you to the ethereum 'smart contracts' crowd.
posted by snuffleupagus at 8:11 AM on July 27 [6 favorites]


This is literally why they invented the chess clock

This is literally why I bought a chess clock.
posted by pompomtom at 8:25 AM on July 27 [2 favorites]


I guess what confuses me with examples like this is, if everything the AI is doing has to be programmed in advance, and its racism was an oversight instead of baked in, then how did it come to care about skin tone? It sounds like with an elaborate enough learning AI, the system proposes its own new math problems and checks to see if solving that problem produces better results.
These AIs are neural networks. They work is by being given a lot of "training data" that is already labeled. They then look for really low-level "features" in the data. Say, for example, we want to categorize images of human faces as male or female. The AI will "normalize" the image—look for landmarks like eyes and mouth, resize the image, and infer the head's rotation and correct that based on these. Then it will look for a bunch of very low-level features, doing stuff like edge-detection around the edge of the face and at major features, and will come up with a list of features, like this edge in this region of the image is at this angle, etc. Then it will put together slightly larger patterns of low level features, and larger ones again based on that (the number of layers will depend on the neural-net software). All of this is happening inside the black box: unless we peer into its specific workings, we'll have no idea what features or sets of features it is picking up on, and they're definitely not features that we would pick up on. Then you give it some unlabeled images to try out its feature detection, and you score how it did at classifying them, and based on that it learns which features are predictive, and changes how much weight to give them. Usually this process is repeated. Once you're satisfied with your results, you set it loose on an unsuspecting world.

So if you give your AI a thousand faces as training data, and only ten are dark-skinned people, it's going to be really bad at even recognizing where the edges of their faces are, because the contrast is completely different (I am reminded of how Kodak film was really bad for photographing dark-skinned people). The film wasn't racist, but the people who designed it had a blind spot at best.
posted by adamrice at 8:30 AM on July 27 [3 favorites]


if everything the AI is doing has to be programmed in advance, and its racism was an oversight instead of baked in, then how did it come to care about skin tone?
It's not that the AI cares about differences in skin tone, it's that the AI only cares about what you train it with. If you train the AI with data that systematically underrepresents some aspect of real-world data, then the training process assigns insufficient penalty to failures to classify that data, and so it can end up underfitting that data if doing so allows it to perform slightly better on the rest of the training set.

Some of this might be an actual physical "some contrast levels are harder to analyze than others", maybe, but it's not just a matter of light vs dark. For example, "At the low false accept rates required for most security applications, the Western algorithm recognized Caucasian faces more accurately than East Asian faces and the East Asian algorithm recognized East Asian faces more accurately than Caucasian faces."

Joy Buolamwini suggests that failures like these could be rooted in biases as simple as the "down the hall test" - you're under time pressure, you need more data quickly for testing, so you go down the hall and find people (who look disproportionately like yourself compared to the world as a whole) to generate it. Mistakes like that can end up enshrined in benchmark data sets, and benchmark data sets are kind of necessary to quantify differences in performance between proprietary codes, so oversights can have long-lived effects.

Aside from the benchmarking issue, this isn't even an AI-specific problem, it's an artifact of trying to do any kind of extrapolation of statistical results outside the sample sets they were obtained from. See the WEIRD results in psychology, or the occasional biomedical disaster when a drug's effect shows racial variation and the clinical study population does not.
posted by roystgnr at 9:07 AM on July 27 [6 favorites]


We need to interrogate the utility and effect of the algorithm, in context, and reject the attempt to load it up with semantics that aren’t purely mathematical.

I would be more inclined to say we should reject certifying anything at all as "not racist." The default assumption should be that everything and everyone needs to be scrutinized for racist outcomes and that maybe racist effects can be attenuated or compensated for enough to be fit for a particular purpose at a particular point in history.
posted by XMLicious at 10:00 AM on July 27 [1 favorite]


These AIs sound like Clever Hans. They are looking at the data, but not at the things the experimenters expect them to be looking at. It's the experimenters expectations that are limited, not the capabilities of the AIs.
posted by SPrintF at 12:44 PM on July 27 [3 favorites]


I'm suddenly concerned I was unknowingly used as a template for this AI, so close the resemblance of its method to my own.

The "pause-to-avoid-losing" thing got to me as well. I played a ton of online Blood Bowl and abusing the pause feature to avoid losing is one of the most tried and true methods for griefers. Put in some safeguards and the griefers evolved ways to continue to abuse it.

Are AIs the griefers of the future?
posted by Justinian at 1:48 PM on July 27 [1 favorite]


The Tetris one was previously. It did better at Super Mario, but the first video shows it playing Tetris a little after the 15 minute mark.
posted by RobotHero at 2:04 PM on July 27


If you're not following The Strange Log on the twitters, it's very much worth your time.
posted by mhoye at 6:41 PM on July 27 [2 favorites]


I guess what confuses me with examples like this is, if everything the AI is doing has to be programmed in advance, and its racism was an oversight instead of baked in, then how did it come to care about skin tone?

It might have failed to recognized POC altogether (the training data doesn't contain any, so why should it?) or it may care about skin tone because even white people have some variation in skin tone, in such a way that a black person is so far outside what it's taught to expect about the problem that it either (a) does nothing (b) overreacts or (c) does some weirdo thing that doesn't correspond to what a human might (naively) expect. But calling the program racist is sort of nonsensical because it doesn't even know what race is. AI programming is deeply weird and humans constantly misattribute AI behaviors anthropomorphically. Imagine a program designed to identify general purpose images that you want to apply to the problem of distinguishing species of flowers. It may not care enough about the color red to tell roses apart, because red was not a useful-enough distinguisher in the general case to make it sensitive enough to that color to subsequently judge roses. Does the program hate roses?
posted by axiom at 9:46 PM on July 27


On a basic level we're asking: what are we teaching our children?

The nets need a picture of the world to build their models on. So what picture do we show them? The world as it seems to just us (probably racist), the world as it is, or the world as we want it to be?
posted by wemayfreeze at 11:41 PM on July 27 [2 favorites]


Does the program hate roses?

Racism doesn't require hate.

Really, there should not be any intellectual scaffolding or rationales whatsoever for certifying anything as independent of racist outcomes or apart from all of our endemic racist systems. The problem is how we talk about racism, not how we talk about algorithms when we call them racist. You might as well ask whether a law or a policy "knows" what race is (and once you've even phrased things that way you're already anthropomorphizing) or if a law or policy can hate something.

I'd hope we can agree that even a law or policy which doesn't contain any wording referencing race can still be racist. (Or at least acknowledge that someone claiming this is impossible isn't saying anything new which wasn't said a hundred years ago and more, and hence some of the analysis of similar notions from the course of the last century should be due diligence reading before making such arguments.)

By the same token an algorithm which doesn't "know" about race can still very easily be racist. Especially if, far from one that's avoiding race-related topics or concepts like a law written with the intent to sound racially neutral is, we're considering an algorithm intended to optimally mathematically describe a category of images which qualify as containing a depiction of a human face.

There is no naïve way to avoid racism any more than there is a naïve way to avoid security risk. "The algorithm isn't racist because it doesn't even know what race is!" is the absurd nonsensical assertion, just like "The algorithm is secure because it doesn't even know what security is!" would be.
posted by XMLicious at 4:21 AM on July 28 [3 favorites]


The AI in the Elite Dangerous videogame started crafting overly powerful weapons. "It appears that the unusual weapons attacks were caused by some form of networking issue which allowed the NPC AI to merge weapon stats and abilities.”

Welp humanity, it’s been nice knowing ya. We’ve had a good run.
posted by panama joe at 7:02 AM on July 28


I totally understand the inclination there— “X is not possibly racist” is often the face of overt racism. At the same time, though... I study racist algorithms, and I sincerely believe that it’s unhelpful to say that an algorithm is racist.

It’s like, “no molecule is racist” is a different thing from “nothing made of molecules is racist”. And maybe that distinction can’t always be safely drawn, discourse-wise, so it demands caution in how and when we say it. But if caution leads us to say “some molecules are racist per se”, I have to feel like we took a wrong turn somewhere.
posted by emmalemma at 7:53 AM on July 28


We've also got this whole thread on Fairness and Bias in Machine Learning if you want more on that.
posted by RobotHero at 8:10 AM on July 28 [2 favorites]


I'm very confused by the mushrooms presented in alternating order one. Like, I can't see any situation where you'd want the categorization of this photo to depend on what the previous photo was. So you just wouldn't feed that as input data, I would think.

Like, I could understand "Here are multiple photos of the same mushroom" but not "the previous different mushroom I showed you should influence your decision on that this new mushroom is."
posted by RobotHero at 8:21 AM on July 28


"The algorithm isn't racist because it doesn't even know what race is!" is the absurd nonsensical assertion, just like "The algorithm is secure because it doesn't even know what security is!" would be.

That's a really good analogy. "Racist" is best thought of as a description of the functioning of a system, like "insecure" or "inefficient" rather than a description of the intent of a system. It is no more an anthropomorphism to call a computer program racist than to call it slow.
posted by straight at 10:29 AM on July 28 [1 favorite]


It relieves me to know that lazy SkyNet would probably just rename us, declare humans extinct, and call it a win.
posted by BS Artisan at 10:44 AM on July 28 [11 favorites]


I'm very confused by the mushrooms presented in alternating order one. Like, I can't see any situation where you'd want the categorization of this photo to depend on what the previous photo was. So you just wouldn't feed that as input data, I would think.

I took a quick look at the paper. (It's funny how different neural network papers written by biologists are from those in the linguistics and CS world.) They were basically trying looking at which architectures worked for learning to distinguish both good/bad "summer" mushrooms and good/bad "winter" mushrooms (and handle transitioning between seasons reapetedly ), so while they were using feed-forward networks, there was a notion of time inherent in the setup.
posted by hoyland at 1:55 PM on July 28


Why on earth they were alternating good/bad, I don't know. But I can see how they got a model that learned that pattern.
posted by hoyland at 1:56 PM on July 28


I remember seeing a little crawler robot (two wheels) with image recognition and speech capability, exploring the room and announcing what it saw.

Everyone thought this was great, and crowded round to see the wee fella in action. Standing in a close circle. It saw us too.

"shoe"..."shoe"..."shoe"..."shoe"

Finally it gave up, and said "shoe shop".
posted by Wrinkled Stumpskin at 3:31 PM on July 28 [8 favorites]


Wrinkled Stumpskin's hapless shoebot reminds me of Cheesoid.
posted by moonmilk at 4:12 PM on July 28 [3 favorites]


Shoebot/Cheesoid 2020
posted by Don.Kinsayder at 9:24 PM on July 28 [3 favorites]


I totally understand the inclination there— “X is not possibly racist” is often the face of overt racism. At the same time, though... I study racist algorithms, and I sincerely believe that it’s unhelpful to say that an algorithm is racist.

The issue is with “X is not possibly racist” being the face of subtle racism, another face being a refusal to discuss race and racism seriously at all. This was the basis of Stephen Colbert's “I don't see race!” schtick on his old show, for example: that you can say a whole bunch of blithe but obtuse stuff about equality but still act totally racist, in part because the blather allows you to not engage with the topic of the racism in your own statements and actions and in the societal systems you're participating in.

Who is being helped or not helped by avoiding application of the term “racist” to algorithms? Not only does it seem unlikely that you're talking about the victims of racism being helped but to return to my analogy, I feel as though it may be “helpful” in the same way that if you avoid talking about security too much it can help in the short term for meeting deadlines and budgets because you can bypass complicated, difficult discussions which involve lots of forethought and careful consideration of hypotheticals, or keep those discussions on a leash and among the more-technical or most-involved people. But in the long term things will probably happen that will make us look back and say, “We should have talked about racism more frequently and openly and more prominently.” (Part of the perpetual refrain of “Oops, we totally accidentally screwed over PoC in this era of history too!”)

Annoying as it can be, security problems aren't actually caused by use of the single word “secure” to convey a much more complicated reality, like by marketers or by computer system specialists responding to questions from non-technical people where the answer should really always be, “No, it is not absolutely secure.”

In the same way, stipulating that the word “racist” must not be used to refer to algorithms isn't by itself going to help or harm progress on racism, but furnishing tools for redirecting discussion of racism and racist outcomes in the implementation of computer systems into (additional, historically-recurring) semantic quibbles about controlling the application of this most broad and non-specific of terms sure might.
posted by XMLicious at 6:51 AM on July 29


I guess, what I was trying to say is, it's the people that are the racists, not the tool they built, and that's where we should be concentrating. More specifically, I think talking about "this algorithm is racist" obfuscates the fact that the racism is coming from the people in a way that "this law is racist" doesn't, somehow, and I'm not exactly sure why. I guess mostly because people view laws as inherently human constructs while AI is made out to be this other thing entirely (and in some ways it is). Honestly not trying to quibble, I think it's important not to lose sight of the fact that the reason AI gets racist results is that racism is pervasive and sometimes invisible (to white people) and that the way to attack that problem is to fix the people and the racist structure of society/institutions, not fixate on AI instead. But maybe this is getting to deraily so I'll shut up now.
posted by axiom at 9:44 PM on July 29


« Older Cow wins horse race   |   Monster Munch is fundamentally a low crisp for... Newer »


This thread has been archived and is closed to new comments