Maybe Don't Force Your Rapidly Developing AI to Play QWOP?
July 13, 2017 6:07 PM   Subscribe

Google's DeepMind AI Teaches Itself to Walk

Wired overview on DeepMind and their place in the AI/Machine Learning fields.
posted by Navelgazer (33 comments total) 17 users marked this as a favorite
 
Apparently the humanoid figure was "incentivized" by being shown a bus just about to pull away from the corner.

I wonder why it "learned" to wave its arms when human runners keep them in as tight as possible?
posted by Countess Elena at 6:12 PM on July 13, 2017 [8 favorites]


Flashback to 1987 and Genetic Programming Evolution of Controllers for 3-D
Character Animation
by Larry Gritz and James Hahn. It's interesting to see how far we have and haven't come since L*xo Learns To Limbo.
posted by straw at 6:13 PM on July 13, 2017 [1 favorite]


The flapping arms make it look so gleeful!
posted by poe at 6:15 PM on July 13, 2017 [4 favorites]


I wonder why it "learned" to wave its arms when human runners keep them in as tight as possible?

Because they didn't include "waving your arms like a silly person tires you out pretty darn quick" in the simulation.
posted by straight at 6:20 PM on July 13, 2017 [14 favorites]


I wonder why it "learned" to wave its arms when human runners keep them in as tight as possible?

My guess (and I know next to nothing about machine learning - I just loved this video) is that human runners are trying to run as fast as possible, while this is trying to get from A to B, and that the tricks of human runners come from training while the AI flappy arms are some long-since vestigial strategy that hasn't been trained out yet (and it seems to use the arms for balance a lot, almost like a kid running around in airplane mode.)
posted by Navelgazer at 6:21 PM on July 13, 2017


Needs a basket of flower petals to toss about as it flaps through the terrain...
posted by jim in austin at 6:22 PM on July 13, 2017 [5 favorites]


...or a mouth so it can be all WOOP WOOP WOOP as it runs and flails.
posted by nebulawindphone at 6:24 PM on July 13, 2017 [15 favorites]


I think it is just named Phoebe.

Or maybe they didn't model air resistance or muscle fatigue very well.
posted by nat at 6:57 PM on July 13, 2017 [1 favorite]


It plays a pretty fucking good bassoon though.
posted by Wolfdog at 6:58 PM on July 13, 2017 [7 favorites]


In case you've never seen it, Karl Sims evolved creatures from 1994 is one of the first learning systems for locomotion. Swimming in a fluid though, which is simpler in some ways. And here's some relatively recent work on learning walking by another researcher. I think this is genetic algorithms, not neural networks.

One of the things about adaptive systems; the training algorithms are very good at learning to exploit any bugs you might have in the simulation physics. You sort of see this in the end of the Google video with the unrealistic walking patterns. It looks like they didn't code in knee pain, for instance, so the knees work in a bizarro way.
posted by Nelson at 7:05 PM on July 13, 2017 [5 favorites]


Oooh! Now teach it to eat babies! :D
posted by sexyrobot at 7:21 PM on July 13, 2017 [3 favorites]


srsly, tho...I'm walking like that everywhere I go from now on!
posted by sexyrobot at 7:22 PM on July 13, 2017 [2 favorites]


A friend of mine recently joined the DeepMind team. I'm going to have to ask him about the flapping arms. And to stop training Skynet...
posted by asnider at 7:44 PM on July 13, 2017 [4 favorites]


Neat!
posted by samthemander at 7:54 PM on July 13, 2017


It mentioned sensors that seemed to imply touch. But clearly there was also visual clues as in how far to jump but they don't mention any visual sensors. (Yeah I know it's just a digital 3D space but...) It goes on to say that the figure also has some sort of description of objects in the space. How much detail is given beyond what could be seen? I think maybe the figure is getting more info than a toddler gets when they are learning to walk.

There's something in the way.

versus

There's a one meter high wall coming in 2.3 meters. It has a 20 cm ledge at the top. Behind it is another wall in only one meter. Etc.

Maybe the flapping arms are just translations of the body movement into just limp relaxed jointed limbs. The learning code probably centered on legs and feet as one of the models was bipedal with no real upper body except a stick and no arms.
posted by njohnson23 at 8:08 PM on July 13, 2017


This makes the Sorcerer's Apprentice the perfect metaphor for our future with AI.
posted by vorpal bunny at 8:23 PM on July 13, 2017 [6 favorites]


This is frankly terrifying.
posted by mollymillions at 8:42 PM on July 13, 2017 [2 favorites]


I think the flappy arms act like whiskers, or bumbers, to help it deal with walls and other tall obstacles. Runners keep their arms close because of air resistance, and I bet there's no air resistance in the simulation.
posted by I-Write-Essays at 8:55 PM on July 13, 2017


Who could have known that 1930's Disney cartoons were the true model for walking?

On the other hand, the 4-legged version reminds me of a Tachikoma. Which is the highest praise I can think of.
posted by happyroach at 9:05 PM on July 13, 2017 [1 favorite]


srsly, tho...I'm walking like that everywhere I go from now on!
posted by sexyrobot


They're learning from each other now. This is how the AI apocalypse starts.
posted by a car full of lions at 10:08 PM on July 13, 2017 [15 favorites]


These flailing arms remind me of a type of "stability well" or localized energy minimum I used to see in molecular modeling simulations. When I was running simulated heme configurations on (what was in the late '80s) a supercomputer, I recall one of the challenges was that if a structure lucked into one slightly more stable conformation, it would chase down the best way to optimize that conformation, even if the resulting conformation at the end of that optimization pathway wasn't actually the most energetically preferred, overall.

I could see this AI program stumbling (hah!) on a body dynamic (e.g. flailing arms in such-and-such a manner) that provides a small benefit compared to other small adjustments it might make at a particular decision point, and then working to optimize it. In doing so, it travels further down that optimization pathway to become the best flailing-arm runner possible. Even though in doing so, it may have missed the dynamic pathway that may have begun with a less favorable dynamic at the decision point (e.g., hold arms inward), but that could ultimately be optimized to provide a better, more favorable end-optimization, overall.

To overcome this, you'd probably have to program the AI with a new rule that says something like, okay, when you think you have optimized this dynamic, go back to some earlier decision point and intentionally take a less favored option, and see how well you can optimize that method. It's counterintuitive for both robot and human to do something different, that initially gives poorer results, but that eventually might generate better results. So you have to program it to happen intentionally.

Still, it's fascinating to see. And yes, a little alarming!
posted by darkstar at 11:26 PM on July 13, 2017 [4 favorites]


There's a part where it is running and rhythmically pumping one arm in the air where I finally cracked up
posted by Ray Walston, Luck Dragon at 2:14 AM on July 14, 2017 [2 favorites]


This is how Arnold should have moved in Terminator.
posted by warriorqueen at 4:04 AM on July 14, 2017 [5 favorites]


Rapidly developing AI my arse.
posted by GallonOfAlan at 5:41 AM on July 14, 2017


They're learning from each other now. This is how the AI apocalypse starts.

The sexy walking apocalypse. Huh. I'm actually kind of okay with that.
posted by Mr. Bad Example at 6:17 AM on July 14, 2017


Soon enough this AI will be going to the store.
posted by overeducated_alligator at 6:32 AM on July 14, 2017 [2 favorites]


There's no fuel expended or exhaustion. If wasting energy and getting tired isn't a factor we can have some really silly walks!
posted by cmfletcher at 7:39 AM on July 14, 2017


To overcome this, you'd probably have to program the AI with a new rule that says something like, okay, when you think you have optimized this dynamic, go back to some earlier decision point and intentionally take a less favored option, and see how well you can optimize that method.

The method you're describing is called simulated annealing and its related methods, particularly stochastic gradient descent.
posted by jedicus at 7:51 AM on July 14, 2017 [6 favorites]


Because they didn't include "waving your arms like a silly person tires you out pretty darn quick" in the simulation.

Although given that not even human sprinters wave their arms that way, fatigue is probably not the only reason this is suboptimal.

Are they even optimizing for speed? Is it possible they are somehow leveraging YouTube to optimize for comedy?
posted by straight at 10:03 AM on July 14, 2017 [2 favorites]


Obviously Google is developing this AI under a highly classified research and engineering contract from the Ministry of Silly Walks.
posted by AndrewInDC at 11:43 AM on July 14, 2017 [4 favorites]


To overcome this, you'd probably have to program the AI with a new rule that says something like, okay, when you think you have optimized this dynamic, go back to some earlier decision point and intentionally take a less favored option, and see how well you can optimize that method. It's counterintuitive for both robot and human to do something different, that initially gives poorer results, but that eventually might generate better results. So you have to program it to happen intentionally.

RL people frame this as a balance between "exploration" and "exploitation".

the training process they used here involved some amount of simulated lookahead before making a decision, and each change to the policy was made to maximize the expected total future reward, so it's capable of escaping some obvious "traps" in the objective function.

also, they trained many models in parallel with different random seeds, which both helps deal with local maxima and allows them to make use of their enormous server farms without doing anything involving complicated distributed algorithms.
posted by vogon_poet at 12:10 PM on July 14, 2017 [2 favorites]


I suspect that in actual meat bodies, there is a lot of internal muscle work that takes the place of the balance adjustments being modeled by shifting your arms around.
posted by tavella at 12:24 PM on July 14, 2017 [2 favorites]


Jedicus, that is exactly it!
posted by darkstar at 7:39 AM on July 15, 2017


« Older The world has lost another champion of justice and...   |   Eating from the Earth: Hank Shaw's Hunter Angler... Newer »


This thread has been archived and is closed to new comments