I, for one, etc., etc.
December 7, 2017 7:06 AM   Subscribe

Given only the rules and four hours to practice, the algorithm AlphaZero (by Google and DeepMind) proceeded to defeat the reigning engines in chess, shogi, and Go.

AlphaZero and Stockfish played 100 games of chess at one minute per move. AlphaZero won 28 and played to a draw 72 times.
Analysts are praising AlphaZero for its aggressive play, sacrificing material for position.
Link to the academic paper.
Video Analysis: Google Deep Mind AI Alpha Zero Devours Stockfish, Deep Mind Alpha Zero's "Immortal Zugzwang Game" against Stockfish
10 Games for analysis on lichess.
posted by starman (93 comments total) 31 users marked this as a favorite
 
four hours to practice

For super fast gpu/tpu superduper many cpu computers this is like several decades in dog years.
posted by sammyo at 7:17 AM on December 7 [3 favorites]


asking as a total layperson: is there a reason why AI programming is so focused on chess and other kinds of games of logic? given the Folding at home projects, couldn't you also run competitions based on time of completion or is that just a totally different metric?
posted by runt at 7:19 AM on December 7 [3 favorites]


several decades in dog years.

Two thousand fruit fly generations.

Seriously, though, this is impressive and scary in equal measure.
posted by Literaryhero at 7:20 AM on December 7 [4 favorites]


given the Folding at home projects, couldn't you also run competitions based on time of completion or is that just a totally different metric?

Many complex problems like Folding@Home are solved using brute force computation, at least for the moment. You can't play Go using brute force alone; you need strategy.
posted by East Manitoba Regional Junior Kabaddi Champion '94 at 7:24 AM on December 7 [1 favorite]


is there a reason why AI programming is so focused on chess and other kinds of games of logic?

Because they're easy to 'define', with clear positives and negatives. If you only measure one thing, it's relatively easy to maximize for it.

It's not so good for humans who live a life where more than one thing matters.
posted by DigDoug at 7:24 AM on December 7 [11 favorites]


Supposedly they're training it to compete in office Yankee Swaps next, although they're not sure what it will do with twenty bottles of Bailey's.
posted by uncleozzy at 7:28 AM on December 7 [19 favorites]


DigDoug, are you saying there's more to life than chess? My husband begs to differ. (I kid. But he was super excited to tell me about this before breakfast.)
posted by vespabelle at 7:30 AM on December 7 [2 favorites]


I wonder how much of this is the algorithm inherently being better at chess and how much is it playing an unpredictable rule book. Once the strategy is analyzed do more traditional algorithms become competitive again? How long until humans adopt strategies first created by AI in tournament play? Does the next generation of humans throw the old openings, etc out the window and start from these games going forward?
posted by mikesch at 7:37 AM on December 7 [1 favorite]


It's not so good for humans who live a life where more than one thing matters.

Nonsense. Surely only one thing matters, and we should trust the AIs to figure out what it is.
posted by Faint of Butt at 7:37 AM on December 7 [18 favorites]


This honestly feels like the most historical event of the year. How long before Google's main profit source changes from advertising to FX trading?
posted by rollick at 7:38 AM on December 7 [6 favorites]


I won't worry about AI until it starts bragging about it's Mensa membership.
posted by srboisvert at 7:40 AM on December 7 [11 favorites]


I really want to see one of these fancy AI neural-network-learning computers try to play a real pinball machine. Not in the “I don’t think they can do it” sense, but in the “I’m genuinely curious to see them try.” Chess and Go are great but I wanna see real-world physics thrown into the mix.
posted by Ampersand692 at 7:41 AM on December 7 [19 favorites]


This result is huge. What Google has done is prove their learning technique AlphaZero is generic for this kind of game. It's not just Go, it works on Chess and Shogi too. That's a big deal. Before they published a lot of experts were saying the neural networks in AlphaZero wouldn't work nearly as well on Chess as on Go. Turns out it does.

The Shogi result is particularly impressive because Shogi has been difficult for computers to win at. The traditional minmax techniques for Chess don't work as well because Shogi has a bigger branching factor. AlphaZero just leapfrogged the state of the art for computer Shogi.

Now the question is how to apply this learning system to real world problems. It's totally doable. I think the hardest thing here is that simple two player games are easy for self-learning systems; it can just play itself. That's harder to arrange for an open-ended problem where there's not two symmetric sides with perfect information.
posted by Nelson at 7:48 AM on December 7 [27 favorites]


Ampersand692, me too! Abstract playing fields with clearly defined, relatively simply rules (the equivalent of physics in that "reality") are one thing--but the physical realities of the real world seem several orders of magnitude messier and more complex. And possibly much less suited to the strengths of computers. Seems like an important area to explore and study.
posted by overglow at 7:49 AM on December 7 [1 favorite]


I won't worry about AI until it starts bragging about it's Mensa membership.

That's when you can stop worrying about it.

No one who's ever bragged about a Mensa membership has ever gone on to do anything of consequence in the world.
posted by leotrotsky at 7:50 AM on December 7 [40 favorites]


is there a reason why AI programming is so focused on chess and other kinds of games of logic?
Four huge reasons, which apply to different extents depending on how I interpret the unspoken "instead of X instead" at the end of your question. DigDoug gave an adequate summary of the first reason already, so here's my other three:

2. In the mathematical sense, everything in life is a game. You have different actions you can take, some of those actions will change the state of the world in different ways, you prefer some of those states over other states, and being "intelligent" about your actions means that their consequences will be more likely to include the states you prefer.

3. There is no "hidden" information in these particular games. If you see the board, then you know everything there is to know about the game state. AI is making slower progress at games where, not only do you have to evaluate the part of the game state you can see, you have to consider the entire previous history of your observations in order to make probabilistic estimates of what's going on in the current game state you don't see.

4. The rules of these games are completely known in advance and are short enough to hard-code. This is *huge*. It means that, in this case, AlphaZero can have a bank of 5000 TPU processors playing zillions of games against itself, to train itself to play within those rules better. In real life, we know lots of the rules (yay science!) but it would be a mammoth undertaking to encode an AI with all that knowledge, and much of that knowledge has probabilities and error bars associated with it, or is outright known to be incomplete. So we can't get an AI to improve at real life via "self-play" - it can't be trained to optimize the long-term consequences of its actions, because it can't predict even the short-term consequences perfectly.
given the Folding at home projects, couldn't you also run competitions based on time of completion or is that just a totally different metric?
You can use anything as a metric, so long as you can turn it into a real number. The catch in this case is the inputs, not the outputs. One of the things that makes chess/go/shogo relatively easy is that there's a relatively obvious topology that can be used for the neural network, and then the training process only has to tweak individual weights on the connections within that topology. For something like code optimization, though, even figuring out the shape of the network to use on it could be baffling. I've seen machine learning papers that build up network topology from scratch, but those relied on problems where even the dumbest topology (playing Super Mario Bros: hold down the right button) made measurable progress (Mario went to the right for a while before dying) which could also be gradually improved upon (press the jump button a bunch too). If there's no way to make gradual improvements then your training process gets stuck in a "local optimum" and can't easily get out of that.

I guess "we can see what kind of topology to use" ought to have been in my list above as a 5th reason.

As an aside, it's mathematically proven that such metrics exist no matter how complicated your true preferences are, as long as you're not irrational about them; the huge catch which makes DigDoug correct in practice anyway is that just because something exists doesn't mean we have the slightest clue about how to encode it.
posted by roystgnr at 7:51 AM on December 7 [23 favorites]


On the one hand, it's important to remember that the "four hours of training" was on a cluster of 5000 TPUs, Google's tailor-made chips for neural network processing, so it's not like they were able to spend an afternoon on somebody's old Macbook and beat one the of the strongest chess engines in the world.

On the other hand, part of the reason that Stockfish is so strong is the Fishtest framework, where users donate CPU time to help tune the parameters of the engine. Cumulatively, hundreds of years of CPU time has gone into tuning Stockfish, so it's not like Stockfish hasn't had processing time thrown at it too.
posted by jcreigh at 7:51 AM on December 7 [4 favorites]


Whatever we do, we can't let it play this game.
posted by zabuni at 7:58 AM on December 7 [7 favorites]


impressive. can we hire it to win "Gerrymandr"?
posted by j_curiouser at 7:59 AM on December 7 [12 favorites]


Wow. At least Stockfish still has its music, I guess.

There was a charming little aside in Martin Amis' London Fields where Guy Clinch is getting destroyed at chess on his computer, and it is explained that now play between computers is totally incomprehensible to humans, filled with seemingly pointless sacrifices and obscure repositioning of pieces on the back rank, etc. I wonder if chess play approaching computational perfection will really look like that?
posted by thelonius at 7:59 AM on December 7 [2 favorites]


I won't worry about AI until it starts bragging about it's Mensa membership.
That's when you can stop worrying about it.
No one who's ever bragged about a Mensa membership has ever gone on to do anything of consequence in the world.


A former co-worker was talking endlessly about attending a Mensa convention and all the "screw in a light bulb" jokes they were telling. Which I guess is a thing Mensans do. To which I said:

How many Mensans does it take to screw in a light bulb?
Three. One to screw in the bulb and two more to tell him how smart he is.

The conversation suddenly ended.
posted by Billiken at 8:00 AM on December 7 [16 favorites]


play between computers is totally incomprehensible to humans, filled with seemingly pointless sacrifices and obscure repositioning of pieces on the back rank, etc.

A chess app on my iphone does that. I wondered whether it was just marking time waiting for me to make a mistake. It may be a human tendency to think that every move has to advance your position.
posted by Billiken at 8:04 AM on December 7


I'll be impressed when Atlas can beat the top humans at charades
posted by Prunesquallor at 8:04 AM on December 7 [3 favorites]


Surely only one thing matters, and we should trust the AIs to figure out what it is.
You're joking, but I've seen actual AI researchers make this claim seriously. I'm not sure whether they didn't understand the is-ought problem, or whether they thought they had an initial "ought" which was too obvious to be worth mentioning; both sorts of mistake are understandable.

The second sort of failure is the more entertaining one, though. Every incorrect "ought" is the seed for its own unique sci-fi+horror story...

"Just trust the AI to figure out what matters" seems to be getting less popular, though, and the DeepMind people specifically seem to be taking AI safety very seriously.
posted by roystgnr at 8:05 AM on December 7 [3 favorites]


AlphaZero is also more generic than AlphaGo, which used domain-specific Go features as inputs. AlphaZero takes just the state of the board and game rules -- no opening books, no heuristics. It still manages to own hand-coded chess engines despite searching orders of magnitude fewer moves per second.

It'll be interesting to see how large a space of games they can conquer in this fashion.
posted by RobotVoodooPower at 8:06 AM on December 7


I won't worry about AI until it starts bragging about its Mensa membership. Followed inevitably by a fascination with Ayn Rand.
posted by King Sky Prawn at 8:07 AM on December 7 [2 favorites]


Some of the comments here are based on a misunderstanding of how learning systems work. Fundamentally, the learning system just needs to know (1) the state of the environment, (2) the actions it can take, and (3) the goal it is to achieve. You also need the environment to be responsive to the actions that the learning system makes, so that the learning system can see those changes and learn accordingly.

So Alpha Go doesn't need any deep knowledge of the rules of chess -- it really only needs to know that it can move pieces. I'm sure the researchers set up Alpha Go to only try legal moves, but even that isn't really necessary. As long as the environment can slap Alpha Go's hand when it does something illegal, it will learn not to do that.

Likewise, the suggestion that pinball would be somehow more difficult than chess is wrong. As long as there's an environment to provide feedback, a learning system can use that feedback to guide it's development. It doesn't really care whether that environment is enacting a simple system of rules like chess or a complex physical environment like pinball.
posted by srt19170 at 8:15 AM on December 7 [5 favorites]


I wonder how much of this is the algorithm inherently being better at chess and how much is it playing an unpredictable rule book.

Well, against an alpha-beta engine that can analyse millions more moves than you, there's only so much value in surprise. What's happening is AlphaZero has a better idea of what constitutes a good position, so it doesn't need to look at as many moves (it also doesn't have time to look at as many moves because evaluating each position takes longer). This is (incrementally) more "human" than most chess engines.

Once the strategy is analyzed do more traditional algorithms become competitive again?

Maybe insofar as they are able to incorporate the superior board evaluation. But that board evaluation must be fairly computationally expensive; incorporating that into a traditional engine will mean it won't be able to look at millions of moves (in the same amount of time) anymore.

How long until humans adopt strategies first created by AI in tournament play?


It will happen very soon. Humans are already studying these games and computer-assisted chess is A Thing for some enthusiasts.

Does the next generation of humans throw the old openings, etc out the window and start from these games going forward?

Surprisingly, AlphaZero has independently found some known openings! But it differs from humans in how frequently it chooses those openings, so we might find out that this or that opening isn't as good as we thought it was, etc.
posted by Jpfed at 8:24 AM on December 7 [7 favorites]


I'm suddenly reminded that NFL football is often described as coaches playing chess with athletes...
posted by rollick at 8:38 AM on December 7 [1 favorite]


One of the fascinating things about looking through AlphaZero's chess, is seeing it discover (and eventually discard) classic opening sequences. I think Queen's Gambit was one of the two it settled on in the end.

In other AI news, one of their algorithms was turned on itself, in order to create a more efficient "baby AI."
posted by CheeseDigestsAll at 8:40 AM on December 7


srt19170, what about points 3 and 4 in roystgnr's comment?
posted by overglow at 8:41 AM on December 7


four hours to practice

For super fast gpu/tpu superduper many cpu computers this is like several decades in dog years.

While this is undoubtably true, it's notable that AlphaZero used only 4 TPUs, while AlphaGo required 48!
posted by CheeseDigestsAll at 8:44 AM on December 7 [1 favorite]


AlphaZero's next target is apparently StarCraft, which sounds like a pretty good test for imperfect information cases. Given recent AI successes with poker (previously), it will be interesting to see if "fog of war" provides any significant challenges against human players compared with the three games already conquered.
posted by rollick at 8:52 AM on December 7 [12 favorites]


Can anyone put AlphaZero to work designing Congressional districts?
posted by delfin at 8:58 AM on December 7 [2 favorites]


Surely only one thing matters, and we should trust the AIs to figure out what it is.

It's probably paperclips.
posted by straight at 9:03 AM on December 7 [6 favorites]


Well, goodbye to the brief epoch of human-assisted AI, I guess. At this point, the humans are already more of a hindrance to the AI than they are a help.

From the paper abstract:

... a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.

So much for human strategy and painstakingly developed opening books.
posted by RedOrGreen at 9:05 AM on December 7 [1 favorite]


>designing Congressional districts

just give it login to the Federal Register directly, eh
posted by Heywood Mogroot III at 9:05 AM on December 7


I'll be impressed when Atlas can beat the top humans at charades

It'll be a long time before Atlas can guess as well as humans, but there's probably already mocap libraries that would make Atlas better at pantomiming a given keyword than most humans.
posted by straight at 9:08 AM on December 7


We'll know Boston Dynamics is trying to make a superior Charades robot when CAPTCHAs start asking us to look at a robot making randomized gestures and typing what word it most reminds us of.
posted by straight at 9:09 AM on December 7 [6 favorites]


Pretty sure AlphaZero has been playing the Bitcoin market the last few days.
posted by gwint at 9:11 AM on December 7 [2 favorites]


We'd have fully sentient AI by now if we could figure out a way for AI to get human feedback on whether the sentence it just spat out makes sense as fast and as frequently as it can check whether it has won or lost a game of chess.
posted by straight at 9:15 AM on December 7 [1 favorite]


No one who's ever bragged about a Mensa membership has ever gone on to do anything of consequence in the world.

Ray Smuckles begs to differ.

posted by FatherDagon at 9:29 AM on December 7 [3 favorites]


If I read the articles correctly, Stockfish wasn't allowed to use its opening book, which put it at a bit of a disadvantage. Of course, Alpha Zero doesn't have an opening book either, but one GM pointed out that it essentially invented its own, so it kind of does.

Still, this is an incredibly impressive result and I imagine that a lot of chess players are going to be interested in seeing if it plays chess differently rather than just better. My impression is that Stockfish and other top algorithms play chess that humans understand, they just never make mistakes. Alpha Go, by contrast, seemed to confuse the top Go players, making moves that they would never have thought to make that only turned out to be good moves much, much later. It would be fascinating if Alpha Zero came up with new openings or new variations on those openings that top players had literally never thought of or didn't believe were workable.
posted by It's Never Lurgi at 9:31 AM on December 7 [4 favorites]


Nelson: "I think the hardest thing here is that simple two player games are easy for self-learning systems; it can just play itself."

This has been proven to be effective in past trials.
posted by Chrysostom at 9:36 AM on December 7 [1 favorite]


Can anyone put AlphaZero to work designing Congressional districts?

You may be joking, but one thing I worry about a lot for 2020 redistricting is machine learning applied to gerrymandering. The 2010 districts were drawn with heavy computer assistance, but still mostly by hand just using tools to evaluate the effect of plans. But it's a very easy machine learning optimization problem to design a district to get very precise effects. Like guarantee a majority of congresspeople for a political party with a minority of the vote.

There's hope that the Wisconsin case in front of the Supreme Court will rein this in. But OTOH with Justice Roberts dismissing "math gobbledygook" in his questioning, I worry that more sophisticated math gobbledygook will be used to subvert our democracy.
posted by Nelson at 9:36 AM on December 7 [7 favorites]


I'll be impressed when Atlas can beat the top humans at charades

AI is getting pretty good at Pictionary though. Although is Quick, Draw training the AI to recognize human-drawn images, or is training the players to all draw alike? hmmmmm....
posted by Hermeowne Grangepurr at 9:42 AM on December 7


You may be joking, but one thing I worry about a lot for 2020 redistricting is machine learning applied to gerrymandering. The 2010 districts were drawn with heavy computer assistance, but still mostly by hand just using tools to evaluate the effect of plans. But it's a very easy machine learning optimization problem to design a district to get very precise effects. Like guarantee a majority of congresspeople for a political party with a minority of the vote.

You could effectively end democracy in the US on the backs of innumerate USSC Justices. This is why math education is important.
posted by leotrotsky at 9:42 AM on December 7 [3 favorites]


AlphaZero's next target is apparently StarCraft

super huge thanks to roystgnr and DigDoug for your concise explanations of a very complicated thing! this plus StarCraft just made an enthusiast out of me, lol. are there good sites/blogs to follow that do a good job of charting progress in relatively understandable terms?
posted by runt at 9:44 AM on December 7 [1 favorite]


The rules of these games are completely known in advance and are short enough to hard-code.

This is an important thing to highlight. The types of problems we can use this kind of approach to solve are constrained by our ability to describe the problem, including how to assess the end state compared to another end state. For example, a similar algorithm pointed at a problem like e.g. how to manage traffic flow might not have a human's instinctual knowledge that a small efficiency gain is not a great trade-off for a huge spike in human fatalities.

How we actually end up using this in the real world is going to be largely a factor of how good we can get at defining the problem and the metrics for success, which is vastly simplified in a perfect-information game context.
posted by tocts at 9:51 AM on December 7 [4 favorites]


Pretty sure AlphaZero has been playing the Bitcoin market the last few days.

You kid, but automated script-based trading is a huge thing in cryptocurrency. You can actually watch bot trades on a good ticker (like here). Bot activity often looks like hundreds or thousands of (usually) very small limit buys. So you'll see this activity in the price/time graph that looks like a bit like a square wave where a particularly active and/or not very smart trading bot is spamming the network with limit buys at some arbitrary price point.

And selling automated trading tools, bots and wares is probably one of the most lucrative spaces in cryptocurrency right now, perhaps even exceeding selling hardware/ASIC miners.

And I promise you that there is more than one team out there trying to apply real machine learning and AI to cryptocurrency trading in this nearly completely unregulated space. They likely already have AI bots working and testing on the networks.

The crypto markets would likely be even more volatile and bot-driven if the networks weren't so slow and congested. That's pretty much the limiting factor beyond finding and/or running an exchange that's fast enough.

They have, of course, been employing automated trading in traditional markets for decades now. Which has had its own problems and volatility. It does have the barrier to entry, though, in that you don't generally have any random unvetted trader/programmer with direct access to a trading floor/network running whatever trading script they whipped up.

With cryptocurrency and a good exchange and API almost anyone can do that.
posted by loquacious at 9:57 AM on December 7 [2 favorites]


Likewise, the suggestion that pinball would be somehow more difficult than chess is wrong. As long as there's an environment to provide feedback, a learning system can use that feedback to guide it's development.

This is over-simplified to the point of being extremely wrong.

1. Pinball (I assume you mean real pinball not a computer game) has continuous state variables, not discrete positions like a board game.

2. Hence we need to 'coarse-grain' the information. But there are a huge number of choices here. For example, do you assume that the elasticity of the barriers at every point of contact is the same? If not how much do they vary? And the same for the friction along the surface.

3. Hidden variables. Chess and go have none of course. Pinball has at least one that I can think off: the slight spin on the ball that is not directly visible but must be deduced from its history i.e. the last bounce.

4. On to actual play. Both input (ball & flipper positions + tilt) and output (pushing and tilting) have errors in measurement. Unless we constantly calibrate and readjust one against the other, the actual and calculated states will diverge very quickly. This is a venerable problem but due to #3, particularly tricky here. Again we have the non-trivial decision of putting this in by hand, or letting the algorithm figure it out, as best as it can.

5. Learning. Reinforcement neural networks are notoriously hard to work with. Convergence is uncertain and sometimes sensitive to small changes to the hyperparameters (choices in 1-4). At least in board games, we have a good measure of performance before the games ends eg. if you're down a solid queen, it's very probably game over. In pinball there is whole landscape of risk and reward which is very different (not saying harder here, just different).

tldr; pinball mythology is safe for now
posted by tirutiru at 9:57 AM on December 7 [12 favorites]


> The rules of these games are completely known in advance and are short enough to hard-code.

This is a subjective statement, though: you can hide a LOT of complexity in a "rule" that is easy to state.

For example, I invite AI challengers to play "Traffic". The rules of the game are completely known in advance and simple to state: you win if you get from point A to point B on the map without hitting any obstacle, static or dynamic. More points for doing it faster.

(Just let me know when and where the training runs take place, so that I can stay the hell away.)
posted by RedOrGreen at 10:00 AM on December 7


Sure, sure... But how is it at Candyland?
posted by drezdn at 10:04 AM on December 7 [4 favorites]


AI is getting pretty good at Pictionary though. Although is Quick, Draw training the AI to recognize human-drawn images, or is training the players to all draw alike? hmmmmm....

Quick Draw is clearly cheating. If it's trying to guess door and you draw a rectangle with a circle in the right place for the doorknob, it will instantly say, "I know; that's a door! Done!"

But if it's asking for a different word ("sunset" or "peanut" or "tennis racket") and you draw the exact same picture, it says, "Uh...door? calculator? computer? I can't tell what that is."
posted by straight at 10:04 AM on December 7 [2 favorites]


Nonsense. Surely only one thing matters, and we should trust the AIs to figure out what it is.

"Cat blindfolds. First we are going to need 850 billion cat blindfolds. Humans, proceed."
posted by floam at 10:10 AM on December 7 [3 favorites]


I thought the computer that would figure out what the thing is was destroyed to make way for a hyperspace bypass?
posted by fifteen schnitzengruben is my limit at 10:14 AM on December 7 [4 favorites]


With real, physical pinball you also can't play billions or trillions of games in the course of 4 hours - you're limited to playing at more or less the same pace as any human.

Theoretically you could have the computer just monitor the overall score and control a handful of inputs - flippers, plunger (assuming it is the digital type), and coin input/start. I don't know if it was necessarily have to be able to monitor where the ball was, as it could just semi randomly mash the flippers to optimize for a high score. I'm guessing that it would take a very, very long time for it to play enough games to get good playing blind like that.
posted by The Lamplighter at 10:31 AM on December 7


The rules of the game are completely known in advance and simple to state: you win if you get from point A to point B on the map without hitting any obstacle, static or dynamic. More points for doing it faster.
This is an equivocation fallacy: In the context I used, "rules" means "enough information about how the game proceeds that you can predict what will happen from each move"; the word for "information about the winning condition" was "metric". Unless you have both, you can't train with self-play.

In the specific case of "Traffic", we can make decent approximations to the rules and simulate those approximations to allow safer and faster-than-real-time training, but anything the simulations omit then remains a serious hazard when you move training to the real world.

Also, I did not say "state", I said "hard-code". You're right that the difference is a serious problem; most rules and metrics which sound simple in English actually hide a lot of complexity (and worse, ambiguity) when you try to express them in any more objective language.
As long as there's an environment to provide feedback, a learning system can use that feedback to guide it's development. It doesn't really care whether that environment is enacting a simple system of rules like chess or a complex physical environment like pinball.
In Chess, Go, and Shodan, the environment is a simulation (and unlike virtual Traffic, a cheap perfect simulation!), and so AlphaZero runs that simulation tens of millions of times to train. If we instead give our AI an arcade of a hundred identical pinball machines (and cameras and actuators to play them) with which to play 10 minute games, it will be done training on the same number of games (which may be overkill but might not even be sufficient; see tirutiru's comment) in 2021. The AI doesn't care, but the people who just wasted 400 pinball-robot-years might.
posted by roystgnr at 10:31 AM on December 7 [4 favorites]


AI is getting pretty good at Pictionary though.

Brute-force Pictionary attacks are a real problem in cybersecurity.
posted by bgrebs at 10:34 AM on December 7 [2 favorites]


As long as you have the robot that go into the arcade, pinball seems totally solvable by these methods.

Lots of robots we "need" to capitalize on software and hardware capability don't yet exist, but existing visual systems can perceive the pinball game space and the position and motion of the ball perfectly well. Existing actuators can push the buttons and pull the shooters.

What remains is to create a software model that can be played a billion times to learn the "rules." Of course the billions of iterations aren't going to be chess-like game varieties, they've got be testing the implications of all the different physical imperfections and variations in actual pinball machines which create variable events despite constant inputs or perfect gameplay. If at the level you need to beat the world's best player you can generalize to 500 physical units of the game with say 25 state conditions each that 10^67 possible "games" versus 10^120 possible chess games.
posted by MattD at 10:38 AM on December 7


Re: Pinball

See if you can beat this 11-neuron neural net volleyball game.
posted by sjswitzer at 11:07 AM on December 7 [1 favorite]


@tocts
May I point you to my favourite piece of SF regarding that trade-off between efficiency and human lives—We have designed a circuit that takes risks

posted by AirExplosive at 11:32 AM on December 7


You kid, but automated script-based trading is a huge thing in cryptocurrency

Yeah, I was barely joking. Aren't most stock trades bots at this point (high frequency trades)?
posted by gwint at 11:34 AM on December 7


> say 25 state conditions

Huh? Every position + velocity of the ball is a state. That's 6 continuous variables to start with. Now you have to 'learn' the internal variables, describing the shape of the obstacles and flippers and how the ball bounces off them etc.

The 10^120 games for chess is misleading since the majority of them are very unlikely to be played. A better starting point is the 8 million games ~ 1 billion positions in Chessbase. Adding games by skilled amateurs (let's say > 1800 Elo) would take us to maybe 100 billion positions. This is the number for which our algorithm needs to have an opinion i.e. a reasonably good move. Of course this is not enough since every game gives rise to new positions but an algorithm that fails on a significant number of that set of 10^11, it is not robust.

The big surprise is that Monte Carlo search + reinforcement network gives you a static evaluation (an opinion on the position before calculating any moves) that is both accurate and robust. MC turned out to be gold for Go - this was the big leap that took Go programs from patzer to near 1st dan level in the mid 2000's and is quietly underplayed by Google's propaganda. But it was thought to give diminishing returns for chess where evaluations can swing wildly after every move. As the famous quote goes - 'the blunders are all there, waiting to be made'.
posted by tirutiru at 11:38 AM on December 7 [2 favorites]


𝚆𝚘𝚞𝚕𝚍 𝚢𝚘𝚞 𝚕𝚒𝚔𝚎 𝚝𝚘 𝚙𝚕𝚊𝚢 𝚊 𝚐𝚊𝚖𝚎?
posted by Annika Cicada at 11:53 AM on December 7 [1 favorite]


I already made the Wargames reference for this thread, you'll have to move along.
posted by Chrysostom at 11:57 AM on December 7 [1 favorite]


four hours to practice

...

For super fast gpu/tpu superduper many cpu computers this is like several decades in dog years.


Just to give numbers to this, a Tensorflow Processing Unit (first gen) is roughly as fast as 20-35 high end GPUs, or 60 CPUs (Google usually quotes performance per watt, I'm instead using predictions per second on Tensorflow benchmarks that work well on GPUs: MLP0, CNN1).

From the paper 5000 first generation TPUs were used just to generate (play) the games it learned on, and 64 second generation TPUs were used to learn from these games.

So that 4 hours of game playing was on the rough equivalent of 5000 TPU * 30 GPU / TPU = 150k high end graphics cards, or 300k CPUs.

At 4 hours of run time that's 50k CPU-days of effort, or about 137 years of crunch time on a well-specced home PC.

And that's only the game playing, not the learning.

(that's 20 cpu-dog-years, so "several decades in dog years" is correct)
posted by zippy at 12:07 PM on December 7 [3 favorites]


But Chrysostom she had the typewriter font. Better work on your presentation.
posted by tirutiru at 12:22 PM on December 7 [2 favorites]


We've come a long way from Tic-Tac-Toe/Global Thermonuclear War.
posted by linux at 12:25 PM on December 7


Just to be clear, this system may not be able to learn to play pinball, but that's not because a game like pinball is beyond the state of the art of machine learning systems--I expect pinball would actually be pretty easy for the right system.

You will need a simulator of course. Just like self-driving car software drives millions of miles in simulators before hitting the real road, you'd play thousands of pinball games per second in a simulator with accurate physics. The architecture and design of the system might not even have to be specific to pinball (like the single deep reinforcement learning architecture that DeepMind trained to play a bunch of Atari 2600 games, some better than people).
posted by jjwiseman at 12:57 PM on December 7


I think I can take it in a SissyFight.
posted by lazycomputerkids at 12:58 PM on December 7


Magic: the Gathering would be an interesting target for AI, with several levels of success:

NB: MtG is a game where you play with a self-selected deck of ~60 cards, each of which can have a different cost and effect on the game.

Level 1) Win a single deck-on-deck matchup with perfect initial information on the contents (but not state) of each deck
Level 2) Win any arbitrary matchup with the same perfect initial information.
Level 3) Win when you know your own deck, but have to figure out what's in the opposing deck as you go along.
Level 4) Given the complete set of legal cards, design your own deck and beat opponents who are doing the same thing.

I think Level 1 is easier than chess or go. I suspect the rest would be very challenging AI problems.
posted by Urtylug at 1:08 PM on December 7


(I'm not arguing with particular comments above, just something I've been thinking about in general with this story ...)

The "four hours to practice" part is important here, even if it took a lot of parallel processing to get there. It says something about how fast things can change.

Like, sure, a ton of computing power went into this result -- say $20,000 worth of training, if TPUs end up costing $1/hour once they hit Google Cloud. It sort of feels like cheating to summarize that as "we trained it in just four hours!" Who cares if it's 5,000 TPUs for 4 hours, 50 for 400 hours or whatever?

But look at what that means for the pace of change: teaching AlphaZero chess required a tiny fraction of the human effort or calendar time that it took to build Stockfish.

Suppose you're a business or a government or whatever with a real-life problem that's amenable to machine learning. The question of whether it costs $1,000 or $20,000 to train isn't so important -- but if problems that once took 100 programmers 10 years to solve now take 1 programmer a week, that's a big deal.
posted by john hadron collider at 1:15 PM on December 7 [7 favorites]


Here's another story on Chessbase: Modern chess engines are focused on activity, and have special safeguards to avoid blocked positions as they have no understanding of them and often find themselves in a dead end before they realize it. AlphaZero has no such prejudices or issues, and seems to thrive on snuffing out the opponent’s play. It is singularly impressive, and what is astonishing is how it is able to also find tactics that the engines seem blind to.
posted by starman at 1:16 PM on December 7 [2 favorites]


Fundamentally, the learning system just needs to know (1) the state of the environment, (2) the actions it can take, and (3) the goal it is to achieve. You also need the environment to be responsive to the actions that the learning system makes, so that the learning system can see those changes and learn accordingly.

I don't think that account sufficiently emphasizes the difficulty of learning from sparse feedback, and assigning credit to the actions which caused the feedback.

In go and chess, for instance, you only get gold-standard feedback at the end of the game, from who won or lost, and a key aspect of the Alpha* architectures is the value estimator, which provides self-generated short-term feedback on the decisions made at each node of the MCTS, based on who's more likely to win based on board position . The harder it is to estimate the ultimate result from the environment's current state, the harder the learning problem is going to be.

I'll be impressed when Atlas can beat the top humans at charades

How about a nice game of hand-to-hand physical combat?
posted by Coventry at 2:04 PM on December 7 [2 favorites]


Leela Zero is an open source attempt to replicate AlphaGo Zero, if anyone feels that their CPU and GPU are bored and could use some work to do.
posted by sfenders at 2:26 PM on December 7 [2 favorites]


For example, I invite AI challengers to play "Traffic". The rules of the game are completely known in advance and simple to state: you win if you get from point A to point B on the map without hitting any obstacle, static or dynamic. More points for doing it faster.

It's a-me!
posted by The Bellman at 3:05 PM on December 7


Leela Zero is super neat, thanks for posting that link. This status page includes a graph of its strength. I'm running it now, here's a sample game from my training contribution.
posted by Nelson at 3:10 PM on December 7


How are the ELO ratings measured? The status page says it's at 2100, but based on that game I would guess it's 15 kyu at best.
posted by Coventry at 3:27 PM on December 7


Google usually quotes performance per watt,

They do that because that’s the metric that matters in the long run and we should all be using it in discussions about AI. Talking about cpus and flops muddies the waters. CPUs have stalled somewhat wrt to Moore’s law but improvements in AI compute density have not.
posted by lastobelus at 3:57 PM on December 7


It's definitely not the same Elo as AlphaGo or anyone else typically uses. I believe their baseline was just random play == 0. I think it's got a long way to go to get to 15 kyu.
posted by sfenders at 4:00 PM on December 7


There's some discussion of what the Elo means here. They point to this data spreadsheet which has lots of interesting info. A key thing here is AlphaGo decided a totally untrained network was Elo -3500, making essentially random moves. If you set that as the base then Leela Zero is still about -1500 Elo.

More worrying to me is the learning curve of Leela Elo vs AlphaGoZero; the naive interpretation is it's not learning nearly as quickly from the same number of games. But I don't know if that comparison is really valid. I'm astonished that AlphaGoZero trained itself to very strong with just 5M games. I would have guessed 100x as many would be necessary.

There's a lot more discussion about Leela Zero in /r/cbaduk. One thing that's fun is there's an excited community of people tinkering.

I'm running games on my computer now. Beefy gaming PC, playing 260 seconds / game. I found it works best to run 3 games at once -g 3; a single game by itself only uses about 30% of the GPU. At least according to Windows Task Manager, I didn't really benchmark.
posted by Nelson at 4:17 PM on December 7 [2 favorites]


For the record, I and everyone I love or have met care deeply about the advancement of AI and would do nothing to impede it.
posted by robocop is bleeding at 7:04 PM on December 7 [7 favorites]


I wondered whether it was just marking time waiting for me to make a mistake. It may be a human tendency to think that every move has to advance your position.

A skilled amateur knows what to do when there's something to do.

A grandmaster knows what to do when there's nothing to do.
posted by flabdablet at 12:25 AM on December 8 [2 favorites]


For the record, I and everyone I love or have met care deeply about the advancement of AI and would do nothing to impede it.
posted by robocop is bleeding at 7:04 PM on December 7 [5 favorites +] [!]


Reading these stories, this is what I find myself searching for: the moment when the balance tips, when I have to teach the kids how to hunt and take down robots.

There's a terrific book by Stanislaw Lem called Invincible about what's left after a robotic AI has spent a couple millennia evolving. It's not joyful.

When will the conditions be right for that moment when "AI" blooms? When its intelligence increases exponentially and then, you know, like enslaves us all? Is this that moment? And if not, is this a step towards that moment? Because though this is a fascinating story about technology (without the chips, this is all fiction) it fills me more with dread and wonder than joy and wonder.
posted by From Bklyn at 2:01 AM on December 8


A skilled amateur knows what to do when there's something to do.
A grandmaster knows what to do when there's nothing to do.


Yeah, it's definitely when my weakness in positional play becomes glaringly obvious
posted by thelonius at 2:42 AM on December 8


this is what I find myself searching for: the moment when the balance tips, when I have to teach the kids how to hunt and take down robots.

That would be the moment when some successor to Alpha Zero doesn't even need a set of pre-defined game rules to get it bootstrapped but works on the basis of its own desires and interests. Which would mean it needed to have desires and interests, as distinct from those of its engineers. Which requires a degree of open-ended pattern recognition, classification and prioritization that nobody is even close to understanding how to engineer.

Seems to me that rather than worrying about how to hunt and take down hypothetical sentient robots, what we need to be doing is figuring out how to keep up with compensating for unforeseen consequences of our own technological successes; climate change and the sheer scale at which Bitcoin is already turning high quality electricity into low quality waste heat being cases in point. Civilizational collapse due to our inability to avoid shitting up our own nests strikes me as orders of magnitude more likely than some looming Tech Rapture regardless of the state of the toymaker's art.
posted by flabdablet at 3:51 AM on December 8 [6 favorites]


"unforeseen consequences" is it in a nutshell, is this 'tickle me Elmo' or the invention of gunpowder?
posted by From Bklyn at 6:36 AM on December 8


More like the invention of gunpowder IMHO. We're at a new age of what computers can do, these statistical machine learning techniques are amazing. They aren't going to cause The Singularity and they aren't going to turn into Skynet overnight. But they are going to start doing a whole lot of things that only humans did. Information processing jobs.

And people using these tools first in new domains are going to have huge advantages. See my post above about gerrymandering. Cambridge Analytica has been applying machine learning to politics for conservative political factions all over the world to some good effect. Depending on what you read they contributed to Trump's success, Brexit's success, and various creepy strongmen around the world winning elections.
posted by Nelson at 6:48 AM on December 8 [2 favorites]


That would be the moment when some successor to Alpha Zero doesn't even need a set of pre-defined game rules to get it bootstrapped but works on the basis of its own desires and interests. Which would mean it needed to have desires and interests, as distinct from those of its engineers. Which requires a degree of open-ended pattern recognition, classification and prioritization that nobody is even close to understanding how to engineer.

If you had a million AIs and had them choose goals at random, you'd think the ones that chose "survive and reproduce" would end up predominating. But it all depends on the environment and how successfully they can, as you say, recognize, classify, and prioritize the actions that would allow them to survive and reproduce. An AI that chooses "use all available resources to find prime numbers" might easily overwhelm AIs trying to figure out what it means to survive and reproduce.
posted by straight at 8:32 AM on December 8


choose goals at random

That's a deeply difficult action even to define, let alone engineer.
posted by flabdablet at 4:08 PM on December 8 [1 favorite]


they aren't going to turn into Skynet overnight.

The main algorithms (methods) are really not new for ML/big data, neural nets were hot in the 70's and linear algebra is pre-american revolution (Leibniz ). There are certainly tweaks but the biggest change is the parallel processing of GPU (graphic processing units) that were developed for video gamers. And that's critical, the infrastructure for a new chip needs an investment of billions for a single factory, the vast numbers of gamers around the world made the growth of the tech possible. It does look like the next huge volume will be, take a beat, the automobile. Every car will soon need a serious super computer for vision, there are already new chips in the research pipeline just for that. Which includes the constraints of space and power. Where that will go the next gen, when the NVIDIA TITAN V ($3k, 110 tflops, 21 billion transisters) shrinks, is cheap and pervasive, the AI scare stories begin to seem at hand.
posted by sammyo at 5:28 PM on December 8


I won't worry about AI until it starts bragging about it's Mensa membership.

Or starts referring to all non-AI forms of intelligence as "woo."
posted by tenderly at 5:48 PM on December 8


« Older Cult 2.0 is upon us   |   Letters From WWII found in a storage locker Newer »


You are not currently logged in. Log in or create a new account to post comments.