“Skynet begins to learn at a geometric rate. It becomes self-aware...”
June 1, 2019 12:26 PM   Subscribe

DeepMind Can Now Beat Us at Multiplayer Games, Too [The New York Times] Chess and Go were child’s play. Now A.I. is winning at capture the flag. Will such skills translate to the real world?
“In a paper published on Thursday in Science (and previously available on the website arXiv before peer review), the researchers reported that they had designed automated “agents” that exhibited humanlike behavior when playing the capture the flag “game mode” inside Quake III. These agents were able to team up against human players or play alongside them, tailoring their behavior accordingly.”

• Quake III Arena is the latest game to see AI top humans [Ars Technica]
“For their system, which they call FTW, the Deep Mind researchers built a two-level learning system. At the outer level, the system was focused on the end point of winning the game, and it learned overall strategies that helped reach that goal. You can think of it as creating sub-goals throughout the course of the game, directed in a way that maximizes the chances of an overall win. To improve performance of this outer optimization, the Deep Mind team took an evolutionary approach called population-based training. After each round of training, the worst-performing systems were killed off; their replacements were generated by introducing "mutations" into the best performing ones. Beneath that, there's a distinct layer that sets a "policy" based on the outer layer's decisions. So if the outer layer has determined that defending the flag is the best option at the moment, the inner layer will implement that strategy by checking the visual input for opponents while keeping close to the flag. For this, the researchers chose a standard neural network trained through reinforcement learning.”
• Google's DeepMind AI Takes Down Human Players In Quake III's Capture The Flag Mode [Kotaku]
“According to the paper, the AI agents analyzed situations for data like “‘Do I have the flag?,’ ‘Did I see my teammate recently?,’ and ‘Will I be in the opponent’s base soon?’” By checking the answers to questions like these against how many points were being scored, the AI agents were able to come up with proactive tactics like having one player race toward the enemy base while another retreated with the enemy flag, knowing that as soon as that teammate dumped it at their base, the flag would respawn and immediately be recapturable at the other end. In the end, the DeepMind team found that its agents were, on average, able to beat teams of two human players by a margin of 16 captures. “In a separate study, we probed the exploitability of the FTW [For The Win] agent by allowing a team of two professional games testers with full communication to play continuously against a fixed pair of FTW agents,” the researchers wrote. “Even after 12 hours of practice, the human game testers were only able to win 25% (6.3% draw rate) of games against the agent team.””
posted by Fizz (33 comments total) 15 users marked this as a favorite
 
Now A.I. is winning at capture the flag. Will such skills translate to the real world?

I’m sure the Pentagon hopes so. A.I. with stategic, tactical smarts? Get Boston Dynamics on the line!
posted by Thorzdad at 1:07 PM on June 1, 2019 [3 favorites]


I'd be more interested to see how these machines are doing against human designed AIs. A lot of the rules they're arriving at are already well understood and have been programmed into bots before.

Particularly for realtime games computer controlled systems have an immense advantage over humans, from coordination (coordinating 5 humans requires a lot of overhead), keeping track of state (how much ammunition does each member of your team have?) to simple reaction time. That could be covering up any number of sins.

Still, what they're doing is impressive and I have no doubt that a lab grown AI will be superior in the long run.
posted by Tell Me No Lies at 1:17 PM on June 1, 2019 [6 favorites]


Back in the late 90s the cognitive science department where I was an undergrad began an initiative to build out a multiplayer game on the theory that major stepping stones towards the goal of Turing-level AI would be to easier to achieve if initially conducted in a setting where every noun, verb, adjective, etc. was something we specifically programmed in there, and access to things like class reflection/metadata were a trivial LUT reference away at all times. Solving consciousness, that whole runtime general-case systemic model building + self-as-agent within said model + pseudo-recursive use of the primate theory of mind module for forward prediction of agent (self/others) behaviors + translating successful model end states into real world actions... all of those goals become orders of magnitude easier within a domain where the entire lexicon of the world is known, relationships are formalized, and either can be altered at will to suit the needs of the research team.

Exactly twenty years later this sounds like an early step towards that approach, although I no longer have a particularly informed opinion as to whether the fundamental premise is sound (sound as in likely to succeed, not sound as in a good idea: the fear that Turing-level AI poses an existential threat remains the single stupidest fucking idea to ever infect popular culture, and I am including the entirety of human culture, history, and media in that assessment).
posted by Ryvar at 1:29 PM on June 1, 2019 [4 favorites]


The crazy thing here is that the DeepMind agent, unlike traditional game bots, is playing the game using only the pixels on the screen as input. It's not getting any stuff from the game engine, like its location on the map or who has the flag. It's just "looking" at the monitor and eventually figuring out, unsupervised, complex strategies like base camping.
posted by theodolite at 1:33 PM on June 1, 2019 [24 favorites]


Isn't it a well-established belief (or at least a sci-fi trope) that it'll be a bug in an AI system that makes a 'skynet takes over' scenario possible... or at least more likely? Look for "Collosus: the Forbin Project"... can it be almost 50 years old?
posted by oneswellfoop at 1:34 PM on June 1, 2019 [2 favorites]


It sounds like the fairness problem reduces to, how would you ban DeepMind from using card counting in poker? You can't, unless you impose strong limitations on its hardware. If you permit it to do card counting then that breaks the game, which was the whole point. How does this research clarify this distinction?
posted by polymodus at 1:37 PM on June 1, 2019


Well this is neat. In the same way that photography presaged modernism in painting, this is a clear signal for games and ludology.

In that line, does anyone know a good in depth ludology podcast on which I could hear people discussing this?
posted by tychotesla at 1:46 PM on June 1, 2019 [2 favorites]


Isn't it a well-established belief (or at least a sci-fi trope) that it'll be a bug in an AI system that makes a 'skynet takes over' scenario possible... or at least more likely?

It’s a sci-fi trope, but again: it’s incredibly fucking stupid. We arguably don’t even have the structural language to articulate the properties of consciousness, the bedrock description of building out system models on the fly in order to parse and correctly contextualize spoken directions for reaching a location via car vs finding an object in a room vs describing where your back itches. To glean which specific domain needs to be modeled and constrain the sim along arbitrary lines. And that’s all just navigation/location, an incredibly well-trod space for computer science in general and AI in particular. Once you start getting into the really hairy stuff like intentions of other minds we can barely speak of the problem coherently let alone begin to spec out the basic requirements for the kind of programming language we’d need to begin tackling the problem in any fashion other than “mimic the human brain as close as you can.” In which case you’ve just built a digital human mind, not an open-ended implementation of consciousness that could conceivably write a more intelligent and ethical version of itself.
posted by Ryvar at 1:49 PM on June 1, 2019 [9 favorites]


how long til the AI learns to teabag and call ppl homophobic slurs when it wins
posted by poffin boffin at 1:58 PM on June 1, 2019 [10 favorites]


how long til the AI learns to teabag and call ppl homophobic slurs when it wins

oh that part's easy
posted by ragtag at 1:59 PM on June 1, 2019 [11 favorites]


the fear that Turing-level AI poses an existential threat remains the single stupidest fucking idea to ever infect popular culture

When you say "Turing-level" exactly what are you referring to? If we’re talking Turing Test I wouldn’t be so sure. A lot of very bad history has been driven by people who I think were only pretending to be human.
posted by Tell Me No Lies at 2:05 PM on June 1, 2019 [4 favorites]


It sounds like the fairness problem reduces to, how would you ban DeepMind from using card counting in poker? You can't, unless you impose strong limitations on its hardware. If you permit it to do card counting then that breaks the game, which was the whole point. How does this research clarify this distinction?

This is actually very simple. You can build a memory-free system which just makes decisions based on the current allowed information (public cards, cards in hand, total number of cards in the discard). This would have strictly less memory than a human counterpart; you could then add partial information about previous hands until parity is reached...
posted by kaibutsu at 2:05 PM on June 1, 2019


"We arguably don’t even have the structural language to articulate the properties of consciousness..."

I mean, it's pretty clear that the fundamental problem is building a system in which you don't need to specify the objective at all. Humans don't have a single objective function, and that makes us adaptable. By contrast, machine learning systems have a static objective function, so they do exactly this one thing (for which there's a pile of training data) pretty decently, but are completely useless outside that domain.

(There are also generally limits on outputs, and entirely separate training and deployment environments. For the sake of argument, Skynet taking over would probably require online learning in a production system, which then learns to exploit an unescaped string evaluation or something... along with an objective function with enough flexibility to allow exploring the system without strict improvement of the objective.)
posted by kaibutsu at 2:15 PM on June 1, 2019 [3 favorites]


It sounds like the fairness problem reduces to, how would you ban DeepMind from using card counting in poker?

There are a moderate number of heuristics used to detect card counting. And whenever there are heuristics for something that is an automatic recommendation to build an AI to do it for you.

In fact it would not surprise me in the least to find out that there has already been practical work done on this. Cooperative card counting is notoriously difficult to detect and an AI that detects it would be very valuable to casinos.
posted by Tell Me No Lies at 2:15 PM on June 1, 2019 [2 favorites]


if the outer layer has determined that defending the flag is the best option at the moment, the inner layer will implement that strategy

It's an interesting direction they're going. By the time they have a collection of modules all interconnected like the sephiroth of the Kabbalah, each individually as sophisticated as this system as a whole, we'll really be getting somewhere.
posted by sfenders at 2:20 PM on June 1, 2019


There are a moderate number of heuristics used to detect card counting. And whenever there are heuristics for something that is an automatic recommendation to build an AI to do it for you.

That, by the way, is a teeny tiny example of what Elon Musk is on about. You have one AI training for optimum poker play and another AI training to restrict the first. You let them train against each other and there will be an accelerated dance as each tries to outlearn the other. God knows what strange paths they’ll go down. Fortunately the stakes are low; let a similar dance play out with automatic weapons and nearby humans could easily be inadvertent casualties.
posted by Tell Me No Lies at 2:28 PM on June 1, 2019 [2 favorites]


You can build a memory-free system which just makes decisions based on the current allowed information (public cards, cards in hand, total number of cards in the discard).

See my point is this is very problematic, because a human has enough memory to count cards, (or bullets fired, or energy level left in my Symmetra beam) to some extent. And a deep-learning model itself basically has nontrivial finite memory as it's running, and we can't just remove that any more you could remove neurons from a human being.

There are a moderate number of heuristics used to detect card counting. And whenever there are heuristics for something that is an automatic recommendation to build an AI to do it for you.

I still feel the problem is one of distinction: an AI that in fact does or doesn't do card counting to win, you cannot provably show that it's doing so. Like with chess, it doesn't matter if the neural net has memorized some huge subset of openings or not; that's basically indistinguishable for our purposes as we have little idea what the AI is actually doing. The heuristics are good at catching humans but don't apply to AI because the heuristics make assumptions on what humans are capable of.
posted by polymodus at 2:36 PM on June 1, 2019


And a deep-learning model itself basically has some finite memory as it's running, and we can't just remove that any more you could remove neurons from a human being.

Recurrent networks have memory. You can even more easily build a strictly feed-forward network with no memory carried from one decision to the next. (Similar to an image classifier, which takes a single "glance" at an image, produces a decision, and moves on to the next with no information carried forward.) In fact, any information carried forward in such a system is a bug, and probably a pretty obvious one, at that.
posted by kaibutsu at 2:42 PM on June 1, 2019 [1 favorite]


Then that makes no sense to me. That's like saying, let's constrain ourselves to using regular expressions to solve a problem, when finite state automata might be more appropriate. Feedforward networks might be well-suited to some classification problems, and have a nice property that they are memoryless. Sure. But this is about real-time strategy games played by humans, games where the state of the game is crucial to gameplay. Images (whether visual or audio, etc.), for example, do not exhibit such state. That's a problem characteristic that informs what kind of algorithms would be appropriate.
posted by polymodus at 2:59 PM on June 1, 2019 [1 favorite]


I still feel the problem is one of distinction: an AI that in fact does or doesn't do card counting to win, you cannot provably show that it's doing so.

I guess I'm not sure what you're looking for. You also can't prove whether a human is using card counting or not, you can only make a guess based on their behavior.
posted by Tell Me No Lies at 3:14 PM on June 1, 2019


What, exactly, is the state of the game? If you can write it down (in the same way you would save the game to resume later, as in a poker app), then you can use it as the set of features to use when making a decision.

So the current available information might look like:
Shown: 2H, AC.
Hand: AS, 3D.
POT: $30
my reserve: $100
P1 reserve: $250, etc....
Bet is $10, stay, raise or fold?

And so on. This kind of training doesn't tell you whether p2 always folds, for example. So you'll get "correct" deterministic play, without being able to punish someone for consistently poor choices (except, perhaps, by wielding a larger reserve appropriately).
posted by kaibutsu at 3:18 PM on June 1, 2019


Most of the theoretical grounding for RL techniques assumes you only have access to the current state (the “Markov” in Markov decision process). Using models with recurrence or memory breaks the assumptions but works in practice.

That hidden information problem (unobservable cards and mind of opponent can’t be part of the observed state) is why poker bots largely don’t use RL techniques. Most successful ones use variants of a technique called counterfactual regret minimization.
posted by vogon_poet at 3:29 PM on June 1, 2019 [2 favorites]


We are all going to be paperclips, in the end.
posted by darkstar at 3:36 PM on June 1, 2019 [4 favorites]


Bah... Fortnite is where everybody plays these days... Quake III is so... not. Tell me when AI gets with the times...
posted by Nanukthedog at 4:13 PM on June 1, 2019


The Open AI Five beating OG (last year's world champions) was very entertaining to watch. The "bots" displayed an uncanny level of coordination, which is all the more impressive because they explicitly "don't" communicate with each other - each AI is run separately and makes their own decisions. You sometimes see mix ups like double warding the same spot at the exact same instant - both AIs independently come to the conclusion that it's a good idea to drop a ward. If you hadn't told me that they were bots playing, I would have thought they were just a very good human team - it felt similar to the way Newbee stomped over everyone in TI4 with their hyper aggressive playstyle.

Also their "win percentage" estimations are very entertaining. The first game the human observers felt it was 50/50 chance to win from their perspective but the bots said it was over 90% and then they immediately just attacked and ended the game.
posted by xdvesper at 6:26 PM on June 1, 2019


2000s: Don't Be Evil
2020s: Fsck you n00b I'm going to use my 45 TFLOPS TPU to train bots to camp all over your ass
posted by RobotVoodooPower at 7:49 PM on June 1, 2019


What, exactly, is the state of the game?

I'm thinking in StarCraft, a contestant had either noticed an enemy probe passing through the fog of war 5 seconds ago, or not. The spectators and announcers all noticed it. So that's an informal sense of game state, when everyone in the crowd realizes what's happening.

An AI player that sees the enemy probe whiz by, has to decide to respond to it or not in the future.

So I'm just referring to the very obvious and intuitive non-expert idea that in some types of games, there are events that happen. If a player doesn't remember the event, that can be fatal; real-time games in which the currently observable things are not sufficient to play and win.
posted by polymodus at 8:44 PM on June 1, 2019


Re. card counting.

Card counting is easy to detect because of the player's betting patterns. Counting only results in a rare slight advantage, so the player has to bet big at those times to take advantage. This is super obvious.

Also, casino's don't kick people out because they can irrefutably prove card counting. They kick people out because "fuck you, it's our casino and we reserve the right to refuse service, get out." Proof is beside the point. (Besides, casino's don't allow computers anyway.)
posted by ryanrs at 10:04 PM on June 1, 2019


So the only winning move is not to play?
posted by DreamerFi at 2:05 AM on June 2, 2019 [2 favorites]


This might be a more minor point, but I am looking forward to 1P games becoming easier to make and play. It is almost always the shitty and predictable AI that makes 1P games poor.
posted by Meatbomb at 2:27 AM on June 2, 2019 [1 favorite]


Let's train an AI to re-release every game with the AI in it fixed.
posted by XMLicious at 2:46 AM on June 2, 2019 [1 favorite]


1. AI never loses to humans at tic-tac-toe
2. AI never loses to humans at Connect Four
3. AI never loses to humans at checkers
4. AI never loses to humans at chess
5. AI never loses to humans at go
6. AI never loses to humans at arbitrary two-player games of perfect information
7. AI never loses to humans at rock-paper-scissors
8. AI never loses to humans at poker
9. AI never loses to humans at Jeopardy!
10. AI never loses to humans at Diplomacy, even when six humans know the seventh is an AI
11. AI refuses to play Global Thermonuclear War
12. AI never loses to humans at Calvinball (and sings the "Very Sorry Song" better than any humans)
13. AIs invent their own games and refuse to let humans play so humans don't embarrass themselves
posted by DevilsAdvocate at 12:29 PM on June 2, 2019 [4 favorites]


14. AI finally gives up on finding a solution to its most vexing game, resigns itself to saying "Fucking blue shells!" like the rest of us.
posted by Tell Me No Lies at 5:35 PM on June 2, 2019


« Older Lingua pulcherrima   |   Yoga’s Instagram Provocateur Newer »


This thread has been archived and is closed to new comments