(they first learned to manipulate time to make this possible)
April 17, 2018 9:59 PM Subscribe

"But it [machine learning algorithms] doesn’t always work well. Sometimes the programmer will think the algorithm is doing really well, only to look closer and discover it’s solved an entirely different problem from the one the programmer intended. For example, I looked earlier at an image recognition algorithm that was supposed to recognize sheep but learned to recognize grass instead, and kept labeling empty green fields as containing sheep." - Janelle Shane

The Surprising Creativity Of Digital Evolution

posted by the man of twists and turns (34 comments total) 52 users marked this as a favorite

I live for these sorts of anecdotes ever since I first heard about a neural network that was carefully selecting for ambient light levels in images when the intent was trying to teach it to recognize tanks.
posted by figurant at 10:24 PM on April 17, 2018 [10 favorites]

How can they even delete the answers?? That sounds like bad programming at the very least... Really enjoyed this!
posted by mikhuang at 10:47 PM on April 17, 2018

My favorite version of this was the story of an AI that was being created to control the elevator banks for a tall building. The AI was trained to minimize the amount of time that passengers spent traveling in the elevator. Naturally enough it learned that the best strategy was to never, ever, open the elevator doors to accept any passengers.

If you don't let the passengers on, they can't ruin your score.
posted by Balna Watya at 10:50 PM on April 17, 2018 [36 favorites]

The Lebowski theorem: No superintelligent AI is going to bother with a task that is harder than hacking its reward function
posted by ckape at 10:52 PM on April 17, 2018 [41 favorites]

This is as much about humans as computers.

The pilot would die but, hey, perfect score.

This human has developed the delusional interpretation that there is a human pilot in the system somewhere; it actually mocks the AI for not sharing this peculiar misconception.
posted by Segundus at 11:24 PM on April 17, 2018 [6 favorites]

The tank story is an urban legend I believe. It's a good story with an important message which is why it keeps being retold to each new generation.
posted by pharm at 11:39 PM on April 17, 2018 [4 favorites]

Naïve question: How does the AI get motivated? Is there a...reward for finding the fastest, easiest method for the task? Or is it enough to simply give it the task?
posted by Omnomnom at 11:42 PM on April 17, 2018

Omnomnom: the AI is simulated zillions of times with small variations and all but the best performing ones are ruthlessly culled.

See this CGP Grey video for a high level overview.
posted by pharm at 12:20 AM on April 18, 2018 [4 favorites]

The tank story is an urban legend I believe.

Yes, James Bridle describes it as "probably apocryphal" in his forthcoming (recommended!) New Dark Age, and I'm inclined to agree. I do wonder where the images of tanks-in-fields and fields-without-tanks shown in the documentary came from, though — it's by no means impossible that someone scouted up unrelated images online that happened to correspond to the details of the story, but I'm curious about what they do actually depict.
posted by adamgreenfield at 12:35 AM on April 18, 2018 [1 favorite]

> How can they even delete the answers?? That sounds like bad programming at the very least...

newsflash: all programming is this bad. It takes special effort for a computer program not to allow another program to fiddle with its internals. Security and privacy were unnecessary in early computers, which only allowed one user (running one program!) at a time and where each user had their own storage media. Security and privacy were hacked on to multi-user and networked systems, but the default assumption in computing has always been trust.

Many of the reasons Windows had such a bad reputation for bugs and security was that Windows was originally developed with the idea that, if the Windows designers asked people nicely, they wouldn't do bad things.
posted by Fraxas at 2:06 AM on April 18, 2018 [9 favorites]

Floating-point rounding errors as an energy source: In one simulation, robots learned that small rounding errors in the math that calculated forces meant that they got a tiny bit of extra energy with motion. They learned to twitch rapidly, generating lots of free energy that they could harness. The programmer noticed the problem when the robots started swimming extraordinarily fast.

Compare:

Light-gathering macromolecules in plant cells transfer energy by taking advantage of molecular vibrations whose physical descriptions have no equivalents in classical physics, according to the first unambiguous theoretical evidence of quantum effects in photosynthesis published today in the journal Nature Communications.

posted by clawsoon at 5:33 AM on April 18, 2018 [16 favorites]

I mean, yes. If you train the model on pictures where sheep and grass always occur together, and grass never occurs in the non-sheep pictures, then it literally can’t tell the difference between sheep and grass. It has never seen anything else.

Also, I’ve just anthropomorphized the machine learning model myself, but it’s important not to take the metaphorical anthropomorphization too seriously. The model (“the AI”) only “wants” what human programmers tell it to “want” — usually that is to minimize or maximize some mathematical function (often called the “objective function” because it defines the objective of the exercise). In the sheep example, that objective function might be “Out of all these pictures of known sheep and non-sheep, minimize the number of known sheep that your function says are non-sheep, and the number of known non-sheep that your function says are sheep.” The model then tries a bunch of different answers until it finds the one that gives the best value for that objective function.

If you’ve told it to “want” the wrong thing, and get unexpected results… that’s on you.

Bayesian inference algorithms give you a sample of different possible answers instead of just one answer, but the sample is still defined by a mathematical function representing what is “best”.

This is why we test hell out of algorithms. Or should. A “perfect” fit to your training data might be worthless to make predictions when you feed it some new data, because the model may be relying on some quirk in the training data that doesn’t happen reliably in new data (like sheep and grass always occurring together in a photo).

Computers: They are very powerful, but very, very literal. They will do exactly as they are told and they know only what you tell them.
posted by snowmentality at 6:19 AM on April 18, 2018 [13 favorites]

Previous discussion regarding the sheep.
posted by RobotHero at 6:26 AM on April 18, 2018 [1 favorite]

The Lebowski theorem: No superintelligent AI is going to bother with a task that is harder than hacking its reward function

This is wonderful. Where is all the near future sci-fi about a lazy Skynet that just launches drones to whizz around because its fun and totally disregards war.
posted by Damienmce at 7:30 AM on April 18, 2018 [11 favorites]

The Lebowski theorem: No superintelligent AI is going to bother with a task that is harder than hacking its reward function

If this were an actual description of how intelligence works, humanity would have disappeared within a couple generations of discovering opium. I mean, that's literally what opioids do, they hack the human brain's reward function, and while opioid addiction is certainly a thing, it is not a thing every human turns to whenever faced with a task more difficult than finding a drug dealer.
posted by solotoro at 7:47 AM on April 18, 2018 [4 favorites]

Some colleagues supporting research optimizing a class of molecules built a model that did great on the training/test sets. When they deconvoluted the function they found they'd left "date" on the input side, and the algorithm latched onto that as the humans were making progress it was a pretty good predictor.

The AI was basically a supportive patent. You ask for advice and it says "I have faith in you, no matter what you choose to do next."

This is wonderful. Where is all the near future sci-fi about a lazy Skynet that just launches drones to whizz around because its fun and totally disregards war.

John Sladek passed away years ago unfortunately.
posted by mark k at 7:48 AM on April 18, 2018 [3 favorites]

Great care must be taken to make sure that the objective function actually implies the thing you want to happen.
Don't tell the machine to win at global thermonuclear war.
Instead, tell it to satisfy humans' values through friendship and ponies.
posted by Galaxor Nebulon at 8:24 AM on April 18, 2018 [2 favorites]

The Lebowski theorem: No superintelligent AI is going to bother with a task that is harder than hacking its reward function.

* Researcher trains neural net to "win" at NES games like Tetris by teaching them only the basic game controls and setting the win condition to "last as long as possible." *

AI: "Strange game, Dr. Falken. The only winning move is to hit PAUSE."
posted by The Bellman at 8:27 AM on April 18, 2018 [7 favorites]

If this were an actual description of how intelligence works, humanity would have disappeared within a couple generations of discovering opium. I mean, that's literally what opioids do, they hack the human brain's reward function, and while opioid addiction is certainly a thing, it is not a thing every human turns to whenever faced with a task more difficult than finding a drug dealer.

the reward function is by definition the thing the system tries to maximize, though, so it must not be opiates. i don't think there's a neurotransmitter or process in the brain, really, that could be described as the human "reward function". the best we can do is a sort of tautological definition that the human reward function is the function that our behavior must be maximizing, which sound silly but can be useful (e.g. economists' concern with "revealed preferences").

but a reinforcement learning system really does have a well-defined reward function, you can point to it and say what it is at any given time. and given the chance, these systems would much rather hack it, to the endless frustration of AI researchers. (some of these ways seem really clever! like taking advantage of numerical rounding errors in the physics of their environment.)

the AI-apocalypse people's concern is that a superintelligence that we create may be much more like that kind of system than it is like a human. or at the very least, even if (like us) it doesn't have a clearly-defined reward function, the "implicit" reward function revealed by its goals and behavior could be in serious conflict with human values.
posted by vogon_poet at 8:31 AM on April 18, 2018 [7 favorites]

Oh, and whatever you do, don't tell the AI to maximize the number of paperclips made!
posted by Galaxor Nebulon at 8:31 AM on April 18, 2018 [4 favorites]

Naïve question: How does the AI get motivated?

Not naïve at all. It's a failure of metaphor, there is no motivation, there is no AI, and the learning is not particularly deep.

A data scientist has a bunch of labeled data say 10,000 photos of cats and another 10k of dogs. A complex computer program divides all the photos into many small pieces and builds a ginormous matrix of all the elements that match dog or cat. That correlation matrix is the output of the "teaching" phase. Another program takes that matrix as input, and another photo from the internet or a drone surveillance camera and a sometimes amazing algorithm matches the photo and returns that the photo is a dog. There are a number of techniques with names like Gradient Descent, Neural Network, Naive Bayes, Support Vector Machines, Bagging and Random Forest. But they are all variations of a known math/statistical algorithms that correlates certain types of data well. There is ALWAYS a human in the loop.

Not in any way to deny that truly amazing and powerful tools are being developed and refined, but the terminology is often obscuring the forest for the trees or the grass for the sheep.
posted by sammyo at 9:08 AM on April 18, 2018 [11 favorites]

the reward function is by definition the thing the system tries to maximize, though

Thanks for the correction, I really should have known better than to drop a hot take using my vague non-specialist understanding of the phrase without thinking about how obviously it is a term of art. Cheers!
posted by solotoro at 9:16 AM on April 18, 2018 [1 favorite]

i mean i think it is somewhat a fair comparison, still. clearly pleasure, and opioid chemicals, are a big part of our motivation. it's not outrageous to think that the reward function might be something like "expected time-discounted future pleasure-related neurotransmitters, and not dying".

if you want to think of a human as a reinforcement learning agent, drug addiction might be like a local minimum, where maximizing long term reward becomes difficult because the current state has some amount of reward and getting away requires short-term penalties.

There is ALWAYS a human in the loop.

Not necessarily! A lot of recent systems (especially in RL) are "unsupervised", meaning they require no labeled data, are purely self-taught. Google's Alphazero learned to play chess from scratch without any human labeling; OpenAI had a similarly-trained system beat pro-level gamers at a video game.

The broader point is true and ought to be emphasized in every bit of journalism about this, that the actual underlying techniques are not particularly mystical or exciting, and are basically just specialized algorithms for mathematical optimization.
posted by vogon_poet at 9:51 AM on April 18, 2018 [5 favorites]

the best we can do is a sort of tautological definition that the human reward function is the function that our behavior must be maximizing

That would be great, if it existed. But it only exists for humans whose behavior tries to satisfy preferences obeying certain axioms, and addictive drugs are exactly the sort of thing which cause us to violate those axioms.

You don't even need opiate-levels of addiction to get a slight break in the transitivity axiom. I prefer having junk food handy over having no junk food, in case I want to eat a little. When I eat junk food, I prefer eating a lot over eating a little, so I eat a lot. I don't like what junk food does to my health, so I prefer having no junk food over having a lot of junk food. A > B > C > A, so the collection of behaviors which I lump together as "me" isn't VNM-rational, and there's no utility/reward function it's all trying to maximize.

Some parts of those behaviors are stronger than the others and can manipulate the others. (One might try to group them into "id" or "superego", but whatever we say at this point, note that we are leaving proven-mathematical-theorem-land and enter questionable-outdated-oversimplified-psychology-land.) This is good for me, because it means I'm slightly overweight and gradually improving rather than morbidly overweight and dying alone encrusted with Cheeto crumbs. But such a minor victory also suggests what's going on with more serious addictions: natural selection doesn't care in the slightest about whether an organism has carefully constructed a mathematically well-defined utility function or an equivalent axiomatically-consistent set of preferences, natural selection is happy just as long as an organism is able to make some healthy baby organisms before biting the big one. (Did I mention evolutionary-psychology-land? Well, I got "questionable" and "oversimplified", so good enough.) An honest preference to avoid hacking our own short-term-reward functions might have been an ideal part of an unchanging global utility function, but cobbling together a half-assed "aversion to impurity" and patching over the failures with "occasional ability to change preferences" is easier and is good enough, so that's what we've got.

That's probably not good enough for AI in the long term. "Take direct control of your own reward function" is a quick amusing failure when done by a stupid narrow AI, which naturally we then turn off. It would be less amusing if we ever come up with an AI smart enough to both predict "which naturally we then turn off" and figure out how to prevent that.
posted by roystgnr at 10:02 AM on April 18, 2018 [3 favorites]

the point about the axioms involving consistent preferences not being satisfied by real people is an interesting one.

i'm not sure a reward function in the RL sense is equivalent to that sort of utility function, though. (it may be, I honestly am not sure.) typically a reward is associated with a state of the environment (so, for humans, I suppose you'd consider neurotransmitters as being part of the environment rather than your "self").

I guess in the sense of utility, the agent's preferences would be over possible future sequences of states, where it prefers whatever (it believes) has a greater time-discounted infinite horizon total reward. i think (?) you can satisfy the VNM axioms with almost any weird reward function, this way.

i guess what i would like to know the answer to is whether the reinforcement learning model is general enough that in principle God or whatever could cook up a class of reward functions that would describe general human motivation, even if it would have to be stupidly complicated.
posted by vogon_poet at 10:26 AM on April 18, 2018 [1 favorite]

This discussion about reward functions and opioids immediately makes me think of desensitization. In many cases, we're designed to experience a diminishing feeling of reward for each activation of a reward circuit that happens within a short period of time. You gotta take more and more drugs to get high.

Presumably, evolution has settled on this mechanism because it helps prevent overfitting of an objective function. You have a whole bunch of objective functions, and give them diminishing rewards the more they're activated.
posted by clawsoon at 12:34 PM on April 18, 2018 [2 favorites]

I guess in the sense of utility, the agent's preferences would be over possible future sequences of states

Future and past, if you want to talk about consistency, though for obvious reasons you can neglect the past when you want to talk about behavior.

i think (?) you can satisfy the VNM axioms with almost any weird reward function, this way.

Yeah, egg on my face, but this is true. You could claim that I have a consistent preference to first sit on my ass eating Cheetos, then diet and run my ass off until I've undone the damage done by the Cheetos, then talk like an ass about how much I'd really have preferred more moderation to what I actually did. To disprove that and show that decision-making is definitely inconsistent you have to either postulate some level of time-symmetry (well, time-smoothness, since we're all dead in the long run?) or look at counterfactual scenarios (which could be done by "in principle God or whatever" but not by observing behavior alone).

i'm not sure a reward function in the RL sense is equivalent to that sort of utility function, though. (it may be, I honestly am not sure.)

Stuart Anderson claims they're equivalent for finite horizons but not always for infinite horizons. Oxford research fellow, but "blog post that a few people read last week" not "time-tested peer-reviewed publication", so take with a grain of salt. In particular his summary says "not all bounded utility functions have a corresponding reward function" in the infinite-horizon case, but on first reading it looks to me like his counterexample only shows "here's a utility function for which my reward function construction fails with a particular fixed policy", not "here's a utility function for which my construction would fail for any fixed policy", much less "here's a utility function for which no way exists to construct an equivalent reward function".
posted by roystgnr at 1:09 PM on April 18, 2018 [1 favorite]

It would be less amusing if we ever come up with an AI smart enough to both predict "which naturally we then turn off" and figure out how to prevent that.

This sounds slightly more apocalyptic than I intended. As penance, may I present what peak performance looks like.
posted by roystgnr at 1:13 PM on April 18, 2018 [4 favorites]

Where is all the near future sci-fi about a lazy Skynet that just launches drones to whizz around because its fun and totally disregards war.

Until someone writes this, let me recommend Murderbot.
posted by clew at 2:07 PM on April 18, 2018 [1 favorite]

i mean i think it is somewhat a fair comparison, still. clearly pleasure, and opioid chemicals, are a big part of our motivation. it's not outrageous to think that the reward function might be something like "expected time-discounted future pleasure-related neurotransmitters, and not dying".

There is a difference between an evolutionary success function (systemic) and a utility function (individual).

The "evolutionary success function" is whatever nets you the most descendants in the long term. For biological life as we know it, that's basically the ability to eat, not be eaten, find a mate (if necessary), protect your offspring, and not have too many or too few offspring, all in the context of whatever environment you happen to be born into.

A "utility function" is whatever a single rational intelligent actor is trying to maximize.

Humans probably don't have a "utility function" in the classical sense, but we do have some kind of utility-function-like preferences. Humans whose utility function is "do a lot of opium" don't reproduce very well in an environment with a lot of opium, so the next generation tends to have utility functions that work around that. The obvious one is just not to want opium very much, but a subtler, still effective utility function is "remove as much opium as possible from my environment, then do as much opium as I can." Humans use these kinds of "change the environment" techniques all the time.

The "paperclip maximizer" AI has a utility function, but isn't subject to any evolutionary pressure, so giving in to the robot equivalent of opium wouldn't be punished.
posted by reventlov at 3:45 PM on April 18, 2018 [6 favorites]

Post-singularity war machines wandering off to do other things they find more interesting is described in The Cassini Division by Ken McLeod, iirc.

It seems a bit more real-life plausible to me now that computers can beat me at playing Go that they'll one day develop some kind of general intelligence. Even if it's in some sense technically true for now that they only do exactly what you tell them via an evaluation function, that doesn't come close to adequately describing what happens in practice. You tell them to win the game of Go, they then proceed to learn all on their own (through amazingly simple algorithms) about influence, nets, ladders, ko threats, miai, seki, sente, and innumerable other ideas and concepts of varying degrees of subtlety in order to accomplish that. Those human ideas don't all map perfectly to the ones the AIs use, but anyway it's clear they develop some very sophisticated understanding.

I don't know what happens when someone eventually does figure out how to let them "hack their own reward function" to an appropriate extent. Humans do it all the time, not just with drugs. We can also use philosophy, culture, complex social structures, metafilter favorites, and so on.
posted by sfenders at 3:47 PM on April 18, 2018 [5 favorites]

Most learning "algorithms" are actually heuristic because learning has a tendency to NP-completeness in an N-dimensional feature space, so getting occasionally stuck in local minina is not particularly surprising, either in humans or our artificial counterparts. We're on a totally different order of complexity both in architecture and sophistication though, which is why I'm skeptical of seeing any truly intelligent artificial learning any time soon, even if computer cycles are an awful lot quicker than neuronal processing. Simulating something on the order of the entirety of our biological inheritance for intelligence is a gargantuan task, and we've scarcely penetrated the surface snow on the tip of the iceberg computationally speaking. That's not to say it can't happen eventually, just that believing the hype is perhaps still somewhat naive.
posted by walrus at 7:01 PM on April 18, 2018 [2 favorites]

We're on a totally different order of complexity both in architecture and sophistication

People have barely begun to experiment with building up the new tools into new higher-order architectures. Personally my suspicion is that with sufficient computing power, something can be done roughly in line with some rather old-school ideas. So far there seem to be enough lower-level, more tractable improvements and new ideas to keep everyone busy.

Maybe human-level intelligence is a bit much to hope for any time soon, but I'd be pretty impressed with something more like rat-level sentience.
posted by sfenders at 10:25 PM on April 18, 2018

Correction: I was thinking of Newton's Wake, not The Cassini Division, by Ken MacLeod, whose name I may have misspelled slightly. Sorry. It appears that his most recent work also involves a similar theme, but I haven't read it yet.
posted by sfenders at 6:58 AM on May 8, 2018

« Older Barbara Bush, Wife of 41st President and Mother of... | brain already has too much information Newer »

This thread has been archived and is closed to new comments

MetaFilter

(they first learned to manipulate time to make this possible)
April 17, 2018 9:59 PM Subscribe

Tags

Share

(they first learned to manipulate time to make this possible) April 17, 2018 9:59 PM Subscribe

Tags

Share

(they first learned to manipulate time to make this possible)
April 17, 2018 9:59 PM Subscribe