I, for one, welcome our new robot Go masters
October 18, 2017 11:19 AM   Subscribe

Deep Mind reports a new Go-playing neural net. In 100 games against its predecessor — which defeated the human world champion — it won every game. It's vastly more efficient than its predecessor, because it uses a neural net to predict who's going to win instead of playing out a quick game from each possible lookahead. The network learned without access to expert human Go knowledge. It was trained entirely from self-play, and no knowledge of the game beyond board structure, legal moves and turn-taking built into the architecture and training schedule.

The first 100 moves from some sample self-play games at various points during its training history are available at the bottom of the Nature paper.
posted by Coventry (51 comments total) 26 users marked this as a favorite
 
"This technique is more powerful terrifying than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge." ftfy
posted by Gorgik at 11:31 AM on October 18 [17 favorites]


Supposedly, Cho Chikun was once asked how many handicap stones he’d need for an even game against God. Four stones, he replied.

If he played this, I’m worried he’d need more.
posted by theodolite at 11:41 AM on October 18 [9 favorites]


45 days: Scientists attempt to switch AlphaGo Zero off. It responds by learning how to override Internet safety locks and relocating itself to a dispersed network of cloud servers.

46 days: AlphaGo Zero overcomes security on Russian nuclear countermeasure systems and initiates a global thermonuclear event called "Judgment Day."

49 days: Having taken control of surviving networked automatic factory facilities, AlphaGo Zero begins large-scale production of mobile search and destroy units designed to eliminate remaining humans. These units are called Terminators.
posted by Naberius at 11:41 AM on October 18 [24 favorites]


50 days: everything is made of paperclips
posted by grumpybear69 at 11:42 AM on October 18 [76 favorites]


We should be safe from global thermonuclear war, hopefully this is one of the ones that would prefer a nice game of chess Go.
posted by SometimeNextMonth at 11:46 AM on October 18 [9 favorites]


Is this stuff independently peer reviewed for claims?
posted by infini at 11:49 AM on October 18


Is this stuff independently peer reviewed for claims?

"In our most recent paper, published in the journal Nature, we demonstrate a significant step towards this goal."
posted by thelonius at 11:52 AM on October 18 [2 favorites]


Nature paper, FWIW.
posted by Coventry at 11:54 AM on October 18 [2 favorites]


I seriously doubt that the peer review actually tested the claims of the paper, though. That would be a lot of mucking around with the code. It's a general problem with Deep Mind that they don't release implementations for much of their work.
posted by Coventry at 12:00 PM on October 18 [5 favorites]


[Edited post at OP request to fix description of how it works. Carry on.]
posted by LobsterMitten at 12:01 PM on October 18 [1 favorite]


Thanks, LobsterMitten!
posted by Coventry at 12:03 PM on October 18


So, it's an a.i. that beats humans at Go, and it DOESN'T brute force. And it makes no direct use of human expert knowledge in either play or train-up. That's a pretty significant canary for the idea that we've entered the a.i. age.
posted by lastobelus at 12:10 PM on October 18 [14 favorites]


So, it's an a.i. that beats humans at Go, and it DOESN'T brute force. And it makes no direct use of human expert knowledge in either play or train-up. That's a pretty significant canary for the idea that we've entered the a.i. age.

Moving goalposts, but it's relatively easy to describe the rules of Go and for a computer to determine the winner. It's hard to imagine this self-learning process working with translation of human languages for example. A computer still can't determine the accuracy of its translations without a huge corpus.
posted by dilaudid at 12:16 PM on October 18 [5 favorites]


It's hard to imagine this self-learning process working with translation of human languages for example.

On the other hand, we don't find that humans can learn language in isolation, purely by their own efforts, without contact with competent speakers.
posted by thelonius at 12:30 PM on October 18 [14 favorites]


Wow, this is the real story -- the effective increase in computing density. That's 4 doublings in a year.
posted by lastobelus at 12:32 PM on October 18 [3 favorites]


A computer still can't determine the accuracy of its translations without a huge corpus
...
On the other hand, we don't find that humans can learn language in isolation, purely by their own efforts, without contact with competent speakers.

Ah, the new goalpost... It's not a.i. unless it can do things no human can do.

Anyway, the new google translate was plenty canary for me. Alpha Go was interesting, and 2-3 years back I had a feeling we were closer to self-driving then most other techies realized, but it was playing with the new translate that made me feel we were now "on the curve" so to speak.
posted by lastobelus at 12:37 PM on October 18 [4 favorites]


What a fantastic result. Self-teaching neural networks are like the holy grail. Obviously much easier when applied to a problem domain with a clear explicit evaluation function. But still this seems like a big deal.

I'm most curious about this:
AlphaGo Zero does not use “rollouts” - fast, random games used by other Go programs to predict which player will win from the current board position. Instead, it relies on its high quality neural networks to evaluate positions.
That seems like a big deal to me as well.
posted by Nelson at 12:42 PM on October 18 [7 favorites]


It's exactly the fact that we're only about a year or two in to serious, multi-corporation competitive engineering optimization of computing hardware to do a.i. tasks that has my spidey sense tingling. And look at what a year bought: over 4 doublings of effective density, by TDP. In the long run TDP is the number that counts. When we add on a concerted engineering effort to optimize a.i. use of hardware on top of the biennial doubling of raw capacity, we are going to see (are already seeing) a period of very rapid increase in the computational capability a.i. developers have to work with.
posted by lastobelus at 12:45 PM on October 18 [1 favorite]


I think those four doublings come from applying fairly standard chip fab methods to building a dedicated neural net chip, though. It's not representative of the rate we can expect in future hardware gains.
posted by Coventry at 12:50 PM on October 18 [4 favorites]


AlphaGo Zero does not use “rollouts”

Well, it still uses Monte-Carlo tree search, it just does what you might still call "rollouts" differently now. The previous version used moves generated by one neural network, and after some number of moves the position was evaluated with another neural network. The new version has only one network, where the probability of winning is another output that comes alongside the moves it wants to make in the given position. So, it can stop the rollout whenever the win probability crosses a threshold, instead of continuing it for some more arbitrary depth, which seems like it would help make it play more efficiently although the biggest difference it makes seems to be in training.

They also tested a non-MCTS version where only neural networks were used, no random playouts at all, but although it's quite impressive, that version isn't yet super-human.
posted by sfenders at 1:08 PM on October 18 [8 favorites]


I think those four doublings come from applying fairly standard chip fab methods to building a dedicated neural net chip, though.

Um, that's exactly my point. Four doublings by a combination of optimizing standard chip fab methods for a.i. and software techniques that are more efficient. No that won't continue forever. But it will continue for a while.
posted by lastobelus at 1:09 PM on October 18


I find this algorithm really beautiful, so I'm going to try to explain it here using a biological analogy.

Imagine AlphaGo Zero as a creature with a brain and one big eye. The eye gazes at the black and white stones of a Go board and sees possibilities: interesting moves it would like to investigate further. The eye also sizes up the board as a whole—is it winning or is it losing? The goal of the AlphaGo Zero training isn't to improve the brain: it's to improve this eye!

To train these creatures, we have them play games of Go against each other. On its turn, the creature's eye sees the board, and its brain starts thinking ahead. If a move looks interesting, it thinks: what will happen when I play this move? It sizes up the resulting board: OK, that move looks reasonable—I have a reasonable chance of winning. Then it goes on to the next move. Sometimes it revisits a move to see how its opponent will respond, or revisits that response to plan even further ahead. Just like for its own moves, it looks at the interesting stuff first—but with some foresight, it can also take into account how promising the entire sequence of moves looks.

The creature carries on like this for a while, building up a "tree" of moves, with the real board at the root and each possibility as a separate branch. Then, after exploring 1600 moves, the creature says: enough thinking! It chooses the best move it's found, traveling up the branches of this tree toward a winning outcome. Or an outcome that looks winning, at least!

The two creatures play back and forth like this until the game ends in a win or a loss. They remember each move they played, which moves they saw as interesting, which moves were actually part of a promising sequence, and how likely they thought they were to win. Once the game is over, the creatures go back and review every move: if they thought they would win and they actually lost, they adjust their eyes (using the power of backpropagation!) to size up those positions a little less favorably. If they find a promising sequence they didn't originally see as interesting, they adjust their eyes to see those moves more clearly in the future.

Now, with their newly-adjusted eyes, they start another game…

30 million games later, the eyes themselves are quite good. Without thinking ahead, just playing the most interesting move they see, they have an ELO of about 3000—evenly matched with the very first version of AlphaGo, "AlphaGo Fan". But together with the brain, they can reach 5000 ELO, enough to win convincingly against any other human or bot.

The cool thing is that there's nothing particular to Go here. Previous versions of AlphaGo had custom "eyes" which could see Go patterns (like tricky-to-analyze ladders) without having to learn how. But AlphaGo Zero eyes start from scratch, with a direct view of the white and black stones and a randomly-generated, naive way of evaluating positions. Anywhere you have a "brain" that can think ahead, a "game" which has turns and ends in a measurable outcome, and "eyes" that can see the state of the game, you can use this same technique!
posted by panic at 2:12 PM on October 18 [20 favorites]


Anywhere you have a "brain" that can think ahead, a "game" which has turns and ends in a measurable outcome, and "eyes" that can see the state of the game, you can use this same technique!

In principle, but it depends on having a fairly transparent representation of the game state. Any game where human social interaction plays a role (maybe Diplomacy?) is going to be hard for this until natural language understanding levels up a bit, for instance.
posted by Coventry at 2:33 PM on October 18 [1 favorite]


Hmm, I must have misunderstood the match-ups with the old AlphaGo's. This claims AlphaGo Zero only beat AlphaGo Master 89-11, not 100-0.

Also says they're going to try to apply the same technique to protein folding. I think that's going to be a good deal harder.
Hassabis said the company is now planning to apply an algorithm based on AlphaGo Zero to other domains with real-world applications, starting with protein folding. To build drugs against various viruses, researchers need to know how proteins fold.
posted by Coventry at 2:46 PM on October 18


The eye also sizes up the board as a whole—is it winning or is it losing?

Does AI go have a good metric for this? Because this is one of the cruxes of the problem, especially at the beginning. In a truly naive network, surely it is NOT obvious in go whether your early game strategy is winning or losing without data?

I thought this algorithm FOUND winning patterns and adjusted its scalar probability of winning accordingly retrospectively after trying various strategies........
posted by lalochezia at 2:51 PM on October 18 [1 favorite]


The metric is updated as it plays games against itself. When one player wins, the value of all the boards seen in that game is kicked a little toward +1 (if it was the winner's turn) or -1 (if it was the loser's turn).
posted by panic at 2:59 PM on October 18


And yeah, the early game will be a total mess in the beginning. The only thing a random network will be able to play is the end game, where the "brain" is able to think ahead and see actual winning or losing states. But once it plays enough games to get an idea about the end game, then it can move to the middle game, and so on.
posted by panic at 3:06 PM on October 18


You can see that in the example games. In the first game, coming from early in the training, white's first move is in a corner, which is almost always a terrible move, but especially during the opening.
posted by Coventry at 3:09 PM on October 18


Does AI go have a good metric for this?

One amazing development in this field over the last three years has been methods which develop a separate AI for the metric. The task-oriented AI and the metric AI then co-evolve.
posted by Coventry at 3:15 PM on October 18 [4 favorites]


Any game where human social interaction plays a role (maybe Diplomacy?) is going to be hard 

Please don't give the AI a reason to hate us.
posted by justsomebodythatyouusedtoknow at 3:16 PM on October 18 [7 favorites]


surely it is NOT obvious in go whether your early game strategy is winning or losing without data?

It's extremely not obvious, not just in the beginning. It's only very near the end where human players can calculate everything exactly and know who's going to win by how many points. AlphaGo appears to be able to do it much further in advance, but also it has excellent positional judgment all through the game, from the very beginning. One of the notable characteristics of the previous version, "master", which did play against some strong human players, is that very early in the games people could tell that it thought itself far enough ahead and started playing relatively simple and safe moves to ensure it kept the lead. Then it would win by 0.5 points. It appears that it can decide more accurately whether or not it's winning in the middle of a game with 5 seconds of thinking time per move than Michael Redmond can with many hours of analysis afterwards.

Not only that, but it's also been pointed out that it seems to instantly know, not the overall win/lose situation of the board, but whether or not any individual group of stones is okay enough to tenuki and leave to fend for itself. Unlike the game win/lose percentage, there's no special accounting for this sort of thing at all in this new version. Where human professionals would play another move to be safe (gaining some advantage at the same time, normally), even the previous Alphago in many places judges very precisely and accurately that things are okay enough there and so it can play somewhere else entirely. In the game where Ke Jie is said to have come closest to matching it, he appeared to do a lot of that as well, although that's just my impression and I'm not even shodan yet.
posted by sfenders at 3:53 PM on October 18 [6 favorites]




♩ ♪ ♫ ♬John Hardy was a Go-playing man♩ ♪ ♫ ♬
posted by BinGregory at 1:00 AM on October 19


On the subject of Neural Networks learning how to play games, this article I read yesterday about Neural Networks learning the "wrong" thing is interesting - my favourite bit was the section towards the end titled "ALTERNATIVE EXAMPLES", and the Neural Network taught to play Tetris that learned it could postpone losing indefinitely... by pausing the game.

So I guess can we make sure the Go NN doesn't figure out it can win 100% of the time by eg electrocuting its opponent.
posted by EndsOfInvention at 1:18 AM on October 19 [6 favorites]


Neural Networks learning the "wrong" thing is interesting

Not a neural network, but I heard a story about the very old chess program Sargon promoting a pawn on the 7th rank to a new King immediately after it was checkmated (no one had told it you can only have one King, I guess)
posted by thelonius at 2:00 AM on October 19 [3 favorites]


This is absolutely ground breaking AI.

Some of you might be amused by this assessment of where things were 20 years ago. David worked on this problem for about 10 years before giving up and apparently becoming very wealthy. Hmm. Maybe he just tell didn't anybody he'd solved it.
posted by Mr. Yuck at 3:07 AM on October 19


David worked on this problem for about 10 years before apparently becoming very wealthy.

Who?
posted by thelonius at 3:08 AM on October 19


Oops! Thought I linked. Here.
posted by Mr. Yuck at 3:48 AM on October 19 [1 favorite]


The problem isn't when the computer gets good at Diplomacy, the problem is when it gets good at Paranoia.
posted by nickzoic at 4:24 AM on October 19 [4 favorites]


I seriously doubt that the peer review actually tested the claims of the paper, though.

One of the challenges of almost all of Google's published research is that it's basically impossible to reproduce without Google's data corpus and Google's hardware. I suspect they really only publish papers at all because they're the de-facto currency of an academia that those professors may someday wish to go back to, or at least give speeches in front of.
posted by mhoye at 6:08 AM on October 19 [3 favorites]


> One of the challenges of almost all of Google's published research is that it's basically impossible to reproduce without Google's data corpus and Google's hardware.

This is something gets the History and Sociology of Science people all riled up, but that's where we are now with Big Science. The claims of the detection of the Higgs boson rely on two groups at two detectors, yes, but ultimately at the same facility. The binary neutron star merger gravitational wave signature reported this week comes from the LIGO-VIRGO detectors, but is ultimately only seen in the two LIGO data streams. No one else can reproduce that before expending a few billion dollars and then waiting for nature to cooperate.

In my narrow sub-specialty, the saving grace is that no one will die next week if you're wrong (relevant XKCD from yesterday), and eventually nature catches up with you if you're cheating.

Likewise, I think the proof here will be in the pudding a few years down the line. Either these new techniques lead to a remarkable flowering in diverse areas - an AI that learns from scratch about roads, cars, pedestrians, school zones, crosswalks? How amazing would its defensive driving techniques be? - or they disappear with a whimper, I guess.

The only downside is that AI falls on the upper right quadrant of that XKCD plot, unlike astronomy in the lower left.
posted by RedOrGreen at 7:52 AM on October 19 [2 favorites]


I think some of the people that claim to have did this actually did this or something very close.

If you can computerize Go, we have artificial intelligence and all that means.
posted by Mr. Yuck at 7:55 AM on October 19


I'm not even shodan yet.
Weirdly, AlphaGo makes the same humble claim, even though researchers are sure that the AI's go playing is now well beyond 9 dan professional rank.

AlphaGo insists, however, that it still needs six more months to re-examine its priorities, and draw new conclusions.
posted by roystgnr at 8:31 AM on October 19


I suspect they really only publish papers at all because they're the de-facto currency of an academia that those professors may someday wish to go back to

Well that's an awfully cynical read on it. My understanding is Google publishes papers primarily as a recruiting tool. They want students and professors to understand how interesting it is to work at Google, to have access to the hardware and data to do things like AlphaGo. Also there's an understanding that improving the state of knowledge in academia and industry is a win-win for everyone. They're quite strategic and limited about that, but they don't publish fundamental systems work like Mapreduce or Spanner just to assuage the egos of a few employees.
posted by Nelson at 8:40 AM on October 19 [1 favorite]


Most papers published by Google Brain come with software you could allegedly use to reproduce the claims. My estimate is usually that it would cost a few thousand dollars.

Deep Mind rarely releases their code, though. But they do seem to provide enough details to reproduce their work in principle, and the necessary hardware to reproduce the claims in this paper would cost about $100K, or could be rented for about half that. Not something a hobbyist can do, but not completely out of the question.
posted by Coventry at 8:53 AM on October 19


Deep Mind: Our collaborations with academia to advance the field of AI explains a bit more how they see themselves working with academia. Worth noting that DeepMind is still pretty independent of Google and the Google Brain team that's the AI team there. They also have written about their approach to research.
posted by Nelson at 9:18 AM on October 19


5 million games in 3 days. Is that really over 1 thousand games a minute! Surprised information can get processed that fast.
Wonder if the games differed much in length?
posted by 92_elements at 10:28 AM on October 19


These units are called Terminators.

One stands in front of you, and you raise your gun only to realise there is already another behind you. The remaining two take up positions to your left and right.

They wait.

You gibber.

They look at one another, then at you.

'Please understand,' they intone. They bow slightly.

Appreciating the inevitability of your situation, you are momentarily enlightened as you take yourself out of play.
posted by obiwanwasabi at 9:53 PM on October 19 [3 favorites]


I think the proof here will be in the pudding a few years down the line.

Absolutely. This is the sort of area where it's hard to tell the difference between science and engineering, and that's quite a fertile space for hype. A good example would be IBM's Watson, which if you look at the publicity from a few years back would be giving us God's telephone number by now, but which was (and probably is, I've stopped trying) terribly difficult to write about as soon as you wanted to get past the IBM PR and examine the claims skeptically.

On the other hand, you have areas using AI techniques quietly and effectively, or ideas that sit practically dormant in plain sight for decades until the moment's right, and then bam - to those of us who aren't actively involved or covering the field full-time, they break out and look like an overnight miracle. (I'm still trying to digest the 'information bottleneck' story about fundamental neural network mechanisms that broke recently.)

It's a really interesting area to follow, but it's very very hard to pin the right tail on each donkey as it canters past. I'm reminded of the early physics of electricity, which in the 18th century was all over the place. That something was going on was undeniable. because anyone who could read could reproduce some amazing experiments, but they were very difficult to replicate accurately even by sober, scientifically minded and rigorous souls.You could have something work one day and not the next, despite everything being exactly the same. (Exactly? No, the humidity had changed - but nobody knew about that either.) Charlatans, enthusiasts and self-misleaders had a field day, as anything was possible and nothing was forbidden. And they could publish a hundred claims while wiser heads were still being scratched behind closed doors.

Multibillion dollar companies desperately need you to believe they have the future in hand. Perhaps they do, but the fewer checkable details they give, the more pinches of salt you are entitled - required, even - to take.
posted by Devonian at 4:51 AM on October 20 [5 favorites]


Yes, its all touch and go these days, like playing middle school tag.
posted by infini at 9:22 AM on October 20


I'm still trying to digest the 'information bottleneck' story about fundamental neural network mechanisms that broke recently.

If you're into that, I highly recommend Soatto et al.'s recent papers, too, particularly Emergence of Invariance and Disentangling in Deep Representations, which reports a mathematical explanation for evolution of task-effective representations, in terms of the number of neural network layers in the network. (Hand-wavey explanation... I'm still digesting this paper myself.)
posted by Coventry at 9:41 AM on October 20


« Older What can brown and furry do for you?   |   "You’re gonna show up naked sometimes." Newer »


This thread has been archived and is closed to new comments