# Another game folds to computers.

January 10, 2015 1:33 PM Subscribe

Two player limit hold 'em solved, say University of Alberta researchers. You can try it yourself here. Poker (at least this variant) now joins such solved games (solved games previously) as tic-tac-toe, connect four, and checkers.

But unlike those games, in poker you don't have perfect information about the current state of the game:

But unlike those games, in poker you don't have perfect information about the current state of the game:

Cepheus marks a milestone for artificial intelligence and game theory. Many games have been solved previously, notably Connect 4 and checkers. Even more games have seen computers surpass human performance, notably chess and Othello. Such successes have a common property: they all have perfect information, where all players have all of the relevant information to make their decisions. Poker is the antithesis of perfect information where the one most relevant piece of information, the other players' cards, is exactly what is not known.So being able to play poker essentially perfectly is an amazing accomplishment.

Yes, one variant of poker in limited circumstance has been solved. I wish the reporting on things like this wasn't so breathless and overwrought; it still doesn't change the fact that it's a big step.

posted by nubs at 1:46 PM on January 10, 2015 [4 favorites]

posted by nubs at 1:46 PM on January 10, 2015 [4 favorites]

Yeah, given the abysmal state of science reporting, a story about computers beating people at poker, a sensational game, is inevitably going to be over sensationalized. Doing it accurately would require learning about both computer science and poker. But yes, it is a subset of poker, and a subset not even played that often. In a casino you typically have tables of 8 to 10 people, or at least 4 or 5, if it is a slow night. But this does show that poker is vulnerable, and that games with imperfect information in general are vulnerable. Which is pretty huge. It got me excited. I'd really like to read the paper but it's behind a paywall.

posted by jeffamaphone at 1:53 PM on January 10, 2015 [3 favorites]

posted by jeffamaphone at 1:53 PM on January 10, 2015 [3 favorites]

I'm sure the paper is interesting but as far as I know they used the more or less standard algorithm these days, i.e. counterfactual regret. They used some simplifications to reduce the problem space, such as realizing that there are situations where the suits are technically different but the result is the same, such as a flop of Ac Ks Qh is the same same as Ac Ks Qd if you have 2c 2s. This actually reduces the storage and calculation requirements a good bit.

I don't have links on hand but there are available articles of counterfactual regret minimization and like many people I've implemented it myself to solve simple problems.

posted by RustyBrooks at 1:58 PM on January 10, 2015 [6 favorites]

I don't have links on hand but there are available articles of counterfactual regret minimization and like many people I've implemented it myself to solve simple problems.

posted by RustyBrooks at 1:58 PM on January 10, 2015 [6 favorites]

From what I understand, he did the hard way, by playing out every single possible hand and bet in heads up limit hold'em. I don't think it's a feasible strategy for no limit, but I can see expanding it to more limit players.

posted by empath at 1:59 PM on January 10, 2015

posted by empath at 1:59 PM on January 10, 2015

*I Have Jumped the Highest and Nobody Should Try Jumping Higher Because It Is Impossible*,

by The Best Researcher In Jumping High

posted by Pyry at 2:00 PM on January 10, 2015 [8 favorites]

An Introduction to Counterfactual Regret Minimization has some interesting stuff.

posted by jeffamaphone at 2:06 PM on January 10, 2015 [7 favorites]

posted by jeffamaphone at 2:06 PM on January 10, 2015 [7 favorites]

Counterfactual regret is my standard algorithm too.

posted by zeri at 2:16 PM on January 10, 2015 [23 favorites]

posted by zeri at 2:16 PM on January 10, 2015 [23 favorites]

*From what I understand, he did the hard way, by playing out every single possible hand and bet in heads up limit hold'em. I don't think it's a feasible strategy for no limit, but I can see expanding it to more limit players.*

Adding 1 more player has some problems, beyond the problem of a MUCH larger problem space (like... I don't remember the multiple but something like 10^12 worse I think?)

The main problem is, there is not going to be a proper nash equilibrium for more than 2 players. A nash equilibrium (which is afaik what they're trying to find) is a situation where if 2 players are playing a NE strategy, then neither can gain by deviating unilaterally.

This doesn't hold true for 3 person games where sometimes it's possible for 2 players to deviate together and both of them can profit.

posted by RustyBrooks at 2:17 PM on January 10, 2015 [2 favorites]

*Counterfactual regret*is the name for my new shoegaze band.

posted by Pyrogenesis at 2:25 PM on January 10, 2015 [9 favorites]

*A new computer algorithm can play one of the most popular variants of poker essentially perfectly.*

Yep, everyone I know who plays poker plays Heads Up Limit Texas Hold 'Em.

Unlimited bets? More than two players? Not crowd-pleasers, it turns out.

posted by ODiV at 2:37 PM on January 10, 2015 [3 favorites]

Yay, researchers have discovered some new game theory advance. It sounds cool now, but it's probably just going to end up subsumed into economics and make the world we live in even crappier in the long run.

posted by JHarris at 2:45 PM on January 10, 2015 [1 favorite]

posted by JHarris at 2:45 PM on January 10, 2015 [1 favorite]

Nah it's pretty much not extensible - because what they've done is apply something we already knew about to a particular game. There is not really a lot of new material here, just raw application of computing power using a known algorithm

posted by RustyBrooks at 2:46 PM on January 10, 2015

posted by RustyBrooks at 2:46 PM on January 10, 2015

So what is the future of online poker games where money is at stake?

posted by Brian B. at 2:48 PM on January 10, 2015

posted by Brian B. at 2:48 PM on January 10, 2015

*The main problem is, there is not going to be a proper nash equilibrium for more than 2 players.*

I haven't looked at them in forever, but I'm pretty sure the proof for the existence of a Nash is for any finite number of players.

In any case, I don't think they're trying to find the Nash so much as trying to find a winning strategy.

posted by ROU_Xenophobe at 2:51 PM on January 10, 2015 [1 favorite]

*I'd really like to read the paper but it's behind a paywall.*

Protip: A lot of academics make their publications available on their university pages. That seems to be the case here.

posted by Avelwood at 2:52 PM on January 10, 2015 [7 favorites]

The future of that, for the time being, is just fine. Pretty much no one is playing HU limit holdem online, it will be very difficult to use the same techniques to solve 3 handed limit holdem or even heads up no limit holdem, much less being able to play at a full table of NLHE. It's one of those exponential things, where as you add complexity, it multiplies.

This is true for games with no ability to cooperate - but in poker 2 players CAN cooperate, without even explicitly colluding (it's called implicit collusion). It's been a while since I've read about it, but I think in poker-like 3 player games there are many cases where no NE exists.

No, I am 99% sure they are trying to find the NE, i.e. they are looking for a solution that is unexploitable, rather something that is maximally exploitative (which would require a dynamic solution - what they have is a static solution)

posted by RustyBrooks at 2:55 PM on January 10, 2015 [1 favorite]

*I haven't looked at them in forever, but I'm pretty sure the proof for the existence of a Nash is for any finite number of players.*This is true for games with no ability to cooperate - but in poker 2 players CAN cooperate, without even explicitly colluding (it's called implicit collusion). It's been a while since I've read about it, but I think in poker-like 3 player games there are many cases where no NE exists.

*In any case, I don't think they're trying to find the Nash so much as trying to find a winning strategy.*No, I am 99% sure they are trying to find the NE, i.e. they are looking for a solution that is unexploitable, rather something that is maximally exploitative (which would require a dynamic solution - what they have is a static solution)

posted by RustyBrooks at 2:55 PM on January 10, 2015 [1 favorite]

So, uh, one of the authors, Mike Johanson, is MeFi's own.

Hi! Long time lurker (since 2000, I think?), infrequent poster, sometimes Edmonton MeFi meetup organizer.

I'm happy to answer questions, but will also be careful not to run the thread. A few quick comments -

nubs - the main importance here isn't so much that a poker game has been solved - it's that this is the first time any "real" (nontrivial, and actually played by humans) game that involved random chance or imperfect information has been solved. The earlier "real" solved games were deterministic and perfect information, like checkers or Connect-Four.

RustyBrooks - we did use CFR, but a brand new variant called CFR+. It converges incredibly quickly compared to standard CFR, and allowed us to solve a game 1000x larger than we were able to tackle just one year ago. And it definitely has applications outside of poker and outside of HULHE - lately, we've been using it for single-agent problems where there is no adversary, but where you need to be robust against unknown dangers in the environment.

ROU_xenophobe - The goal was to closely approximate a Nash. Our strategy isn't exploitive.

posted by Fully Completely at 3:02 PM on January 10, 2015 [68 favorites]

Hi! Long time lurker (since 2000, I think?), infrequent poster, sometimes Edmonton MeFi meetup organizer.

I'm happy to answer questions, but will also be careful not to run the thread. A few quick comments -

nubs - the main importance here isn't so much that a poker game has been solved - it's that this is the first time any "real" (nontrivial, and actually played by humans) game that involved random chance or imperfect information has been solved. The earlier "real" solved games were deterministic and perfect information, like checkers or Connect-Four.

RustyBrooks - we did use CFR, but a brand new variant called CFR+. It converges incredibly quickly compared to standard CFR, and allowed us to solve a game 1000x larger than we were able to tackle just one year ago. And it definitely has applications outside of poker and outside of HULHE - lately, we've been using it for single-agent problems where there is no adversary, but where you need to be robust against unknown dangers in the environment.

ROU_xenophobe - The goal was to closely approximate a Nash. Our strategy isn't exploitive.

posted by Fully Completely at 3:02 PM on January 10, 2015 [68 favorites]

Hey, thanks!

posted by jeffamaphone at 3:06 PM on January 10, 2015

posted by jeffamaphone at 3:06 PM on January 10, 2015

Fully Completely, how much does it change things if -- as is the rule at most casinos where I play limit holdem -- there is no cap on the number of bets in heads up games? Is the game still solvable?

Also, do you guys have any plans to look at limit holdem games with more than two players?

Nice work, I enjoyed the paper.

posted by mikeand1 at 3:15 PM on January 10, 2015

Also, do you guys have any plans to look at limit holdem games with more than two players?

Nice work, I enjoyed the paper.

posted by mikeand1 at 3:15 PM on January 10, 2015

I need your help.

I've been having an argument about this paper with some colleagues, and one of them maintains the following:

The Nash equilibrium here is for a static state: a hand. It does not take into account a dynamic state most humans can account for (in a limited way): the procession of hands over time. Thus, this strategy is an optimal "machine" strategy and not one that solves poker itself, which is, my colleague claims, an infinite game. Even if machines will play better than humans, machines are incapable of processing the history of the game.

I've tried to argue that history is irrelevant, that history of play is valuable, but it's inferior to what this program offers: a strategy that trumps history by beating the best rule for action you could possibly use.

Isn't that right? The most you can learn from a history is my strategy. So knowing my strategy should always dominate knowing my history, and Cepheus (the software that Bowling et al developed) developed a (weak, essential) Nash equilibrium, which means that it can win even if you know its strategy. So for instance you could notice that I never limp after a few dozen hands, but you're even better off knowing that I follow a strict no-limping rule or limp exactly 0.5% of the time or whatever.

Here's how the authors put it: "Suppose you are playing a simple game of poker against a simple AI. The AI is always dealt either a king or a two and you are always dealt a queen. The AI will always raise all-in when it holds a king and will do the same 50% of the time when it is dealt a two. You then get to decide whether you should call or fold. What should you do? Since the AI will have a king 67% of the time that it raises you should fold your queen 100% of the time. You can’t call the times that it holds a two because it’s impossible to distinguish between the time the AI raises with a king and a two. This principle remains unchanged in much of human on human play."

My colleague claims that there's no proof that's true for poker, just for that made-up game.

I guess what I need is a proof that HULHE is amenable to Nash at all. Anyone have a cite for me?

posted by anotherpanacea at 3:19 PM on January 10, 2015 [1 favorite]

I've been having an argument about this paper with some colleagues, and one of them maintains the following:

The Nash equilibrium here is for a static state: a hand. It does not take into account a dynamic state most humans can account for (in a limited way): the procession of hands over time. Thus, this strategy is an optimal "machine" strategy and not one that solves poker itself, which is, my colleague claims, an infinite game. Even if machines will play better than humans, machines are incapable of processing the history of the game.

I've tried to argue that history is irrelevant, that history of play is valuable, but it's inferior to what this program offers: a strategy that trumps history by beating the best rule for action you could possibly use.

Isn't that right? The most you can learn from a history is my strategy. So knowing my strategy should always dominate knowing my history, and Cepheus (the software that Bowling et al developed) developed a (weak, essential) Nash equilibrium, which means that it can win even if you know its strategy. So for instance you could notice that I never limp after a few dozen hands, but you're even better off knowing that I follow a strict no-limping rule or limp exactly 0.5% of the time or whatever.

Here's how the authors put it: "Suppose you are playing a simple game of poker against a simple AI. The AI is always dealt either a king or a two and you are always dealt a queen. The AI will always raise all-in when it holds a king and will do the same 50% of the time when it is dealt a two. You then get to decide whether you should call or fold. What should you do? Since the AI will have a king 67% of the time that it raises you should fold your queen 100% of the time. You can’t call the times that it holds a two because it’s impossible to distinguish between the time the AI raises with a king and a two. This principle remains unchanged in much of human on human play."

My colleague claims that there's no proof that's true for poker, just for that made-up game.

I guess what I need is a proof that HULHE is amenable to Nash at all. Anyone have a cite for me?

posted by anotherpanacea at 3:19 PM on January 10, 2015 [1 favorite]

Well, shit. Fully Complete, could you help me win (or lose) this argument?

I love Metafilter.

posted by anotherpanacea at 3:23 PM on January 10, 2015 [1 favorite]

I love Metafilter.

posted by anotherpanacea at 3:23 PM on January 10, 2015 [1 favorite]

NE always exist. It's named after John Nash because he proved that all games have a NE. THey just may not have a pure strategy NE.

posted by scunning at 3:29 PM on January 10, 2015

posted by scunning at 3:29 PM on January 10, 2015

mikeand1 - Thanks! If the rounds had unlimited bets but fixed (and realistic) stacks, then yes, it would still be solvable. We used the cap on bets per round typically used in online play, but assumed unlimited stacks.

And we're doing research on multiplayer and no-limit, too. Those games are much (much!) bigger, of course, and so we aren't at human pro calibre yet, and certainly haven't solved them either. We compete in the Annual Computer Poker Competition (ACPC), where heads-up no-limit hold'em has been played since 2007, and 3-player limit hold'em since 2008.

Heads-up limit has 10^14 decision points, whereas heads-up no-limit (with the stacks and blinds used in the ACPC) has 10^161, and many more legal actions at each decision. So solving it looks impossible with any algorithm similar to current approaches. Just writing down a strategy would take 10^137 yottabytes (using one double-precision float per action, to store the probability of taking it) [source].

And "solving" a multiplayer game is a little weird. Unlike in 2-player games, even if you could compute one, there'd be no guarantee that it couldn't lose, as we have in this work.

posted by Fully Completely at 3:33 PM on January 10, 2015 [3 favorites]

And we're doing research on multiplayer and no-limit, too. Those games are much (much!) bigger, of course, and so we aren't at human pro calibre yet, and certainly haven't solved them either. We compete in the Annual Computer Poker Competition (ACPC), where heads-up no-limit hold'em has been played since 2007, and 3-player limit hold'em since 2008.

Heads-up limit has 10^14 decision points, whereas heads-up no-limit (with the stacks and blinds used in the ACPC) has 10^161, and many more legal actions at each decision. So solving it looks impossible with any algorithm similar to current approaches. Just writing down a strategy would take 10^137 yottabytes (using one double-precision float per action, to store the probability of taking it) [source].

And "solving" a multiplayer game is a little weird. Unlike in 2-player games, even if you could compute one, there'd be no guarantee that it couldn't lose, as we have in this work.

posted by Fully Completely at 3:33 PM on January 10, 2015 [3 favorites]

I have, in the past, said that a given thread was the best thread. I, too, had imperfect information, because, plainly, this thread is the best thread.

posted by Errant at 3:47 PM on January 10, 2015 [7 favorites]

posted by Errant at 3:47 PM on January 10, 2015 [7 favorites]

anotherpanacea - No, it sounds like you have it right. From a game theory perspective, the history is only really relevant if you're trying to learn what the other player's strategy is and what flaws it has, or maybe if their strategy is changing over time and you're trying to predict where they're moving towards.

I'm not sure I follow their argument about "there's no proof that's true for poker." Like scunning said, Nash proved that at least one equilibrium exists for every finite game. Are your colleagues claiming that since there could be an infinite history, that it isn't a finite game? For perfect defensive play, the history doesn't matter at all - Cepheus doesn't use or need it. If a strategy can't lose on expectation for one game, then it also can't lose on expectation over any number of games.

In the particular case of two-player zero-sum perfect-recall games (and that covers all two-player poker games), there are efficient algorithms for computing these Nash equilibrium strategies. Further, for that same class of games, the resulting equilibrium strategies also have theoretical guarantees on their worst-case performance: if the players alternate positions, then a Nash equilibrium can do no worse than tie, on average, over a long enough match to reduce the impact of luck. And if the opponent makes mistakes, then it can win.

Cepheus is an extremely close approximation to a Nash equilibrium strategy for heads-up limit Texas hold'em. In 2011, we came up with a fast algorithm for measuring exactly how much a given strategy can lose against a perfect adversary [source]. An exact Nash equilibrium would lose 0 big blinds per game against its worst-case opponent: they would tie. Cepheus is beatable for under 0.001 big blinds per game (< 0.05 big bets/100 if you're poker savvy). In the paper, we make an argument that even if someone knew and used the perfect counter-strategy, that after a lifetime of play (60 million games), they still wouldn't have 95% confidence of being the winner, due to the luck in the cards. It's indistinguishable from a perfect strategy within a human lifetime, so we say it's "essentially solved".

It sounds like your colleagues doubt it on theoretical grounds, but if empirical evidence would help, then they can use our website to see what Cepheus would do at every single decision point: http://poker.srv.ualberta.ca/strategy. And when our website stops getting slammed, they can play against it too.

posted by Fully Completely at 4:01 PM on January 10, 2015 [8 favorites]

I'm not sure I follow their argument about "there's no proof that's true for poker." Like scunning said, Nash proved that at least one equilibrium exists for every finite game. Are your colleagues claiming that since there could be an infinite history, that it isn't a finite game? For perfect defensive play, the history doesn't matter at all - Cepheus doesn't use or need it. If a strategy can't lose on expectation for one game, then it also can't lose on expectation over any number of games.

In the particular case of two-player zero-sum perfect-recall games (and that covers all two-player poker games), there are efficient algorithms for computing these Nash equilibrium strategies. Further, for that same class of games, the resulting equilibrium strategies also have theoretical guarantees on their worst-case performance: if the players alternate positions, then a Nash equilibrium can do no worse than tie, on average, over a long enough match to reduce the impact of luck. And if the opponent makes mistakes, then it can win.

Cepheus is an extremely close approximation to a Nash equilibrium strategy for heads-up limit Texas hold'em. In 2011, we came up with a fast algorithm for measuring exactly how much a given strategy can lose against a perfect adversary [source]. An exact Nash equilibrium would lose 0 big blinds per game against its worst-case opponent: they would tie. Cepheus is beatable for under 0.001 big blinds per game (< 0.05 big bets/100 if you're poker savvy). In the paper, we make an argument that even if someone knew and used the perfect counter-strategy, that after a lifetime of play (60 million games), they still wouldn't have 95% confidence of being the winner, due to the luck in the cards. It's indistinguishable from a perfect strategy within a human lifetime, so we say it's "essentially solved".

It sounds like your colleagues doubt it on theoretical grounds, but if empirical evidence would help, then they can use our website to see what Cepheus would do at every single decision point: http://poker.srv.ualberta.ca/strategy. And when our website stops getting slammed, they can play against it too.

posted by Fully Completely at 4:01 PM on January 10, 2015 [8 favorites]

Metapoker anyone?

posted by Samuel Farrow at 4:22 PM on January 10, 2015

posted by Samuel Farrow at 4:22 PM on January 10, 2015

Cool stuff! I love poker but am not very math competent. What are the implications for other HU limit games, like Stud or Omaha? Are those more likely to be solvable than HU NLHE or 3 player LHE?

posted by Potomac Avenue at 4:24 PM on January 10, 2015

posted by Potomac Avenue at 4:24 PM on January 10, 2015

*nubs - the main importance here isn't so much that a poker game has been solved - it's that this is the first time any "real" (nontrivial, and actually played by humans) game that involved random chance or imperfect information has been solved. The earlier "real" solved games were deterministic and perfect information, like checkers or Connect-Four.*

Sorry, I guess my comment wasn't clear in what I was trying to say - I did understand the fact that this was an important step forward for those reasons (dealing with imperfect information), even if the "case" is a poker variant that might not be very common. It's a big breakthrough, and even though my field isn't computer science or game theory anything like that, I get it, I just apparently suck at making my point.

Congrats, and I look forward to playing against your system.

posted by nubs at 4:26 PM on January 10, 2015

Also I'm not going to play against your bot on your system because it is obviously rogged.

posted by Potomac Avenue at 4:38 PM on January 10, 2015 [2 favorites]

posted by Potomac Avenue at 4:38 PM on January 10, 2015 [2 favorites]

Does a game like this have a single, unique solution (at least in theory)? Would it even be possible to know if that One True Solution had been discovered, other than testing results statistically?

If I'm reading it right, 'essentially solved' sounds like 'solved for all practical purposes', rather than something like a mathematical proof.

Either way, this is fascinating. Thanks for joining the comments, Fully Completely!*

Incidently, is your username a reference to your work?

posted by graphnerd at 5:31 PM on January 10, 2015

If I'm reading it right, 'essentially solved' sounds like 'solved for all practical purposes', rather than something like a mathematical proof.

Either way, this is fascinating. Thanks for joining the comments, Fully Completely!*

Incidently, is your username a reference to your work?

posted by graphnerd at 5:31 PM on January 10, 2015

Curious: what if computers/robots running this algorithm played each other? Outcome?

posted by CrowGoat at 5:49 PM on January 10, 2015

posted by CrowGoat at 5:49 PM on January 10, 2015

*anotherpanacea - No, it sounds like you have it right.*

Thanks Fully Complete! We really enjoyed your paper and your work. Seems like y'all did too!

Let's talk applications: could this be used for financial markets? It seems like you'd lose all the restrictions you need to make it work (n players, unlimited bets), but still it's tempting....

posted by anotherpanacea at 6:13 PM on January 10, 2015

nubs - I'm sorry, I misread the tone. Thanks!

Potomac Avenue - heads-up games are all solvable in theory, but the practice is likely a ways off - at least, to outright solve the game. With an earlier program, Polaris, we narrowly lost to human pros in 2007 and narrowly won in 2008. But in 2011, we were able to measure how close Polaris was to perfect play, and discovered that it was nowhere near. Even though it won against some of the world's best HULHE specialists, it was still beatable itself for 0.235 big blinds per game - to poker players, a laughably huge number. Always folding would lose 0.75 big blinds per game.

So while Stud and Omaha and no-limit hold'em are much larger than limit hold'em, and so likely won't be solved for a long time (short of a huge advance in game solving algorithms), there'll be superhuman programs before that. I have no idea about when to predict for that, though.

3-player introduces deep theoretical challenges, since the whole equilibrium approximation approach that we use doesn't really apply anymore. Even if you could compute one, there would be no guarantee that it would be effective or not lose. Nonetheless, we do enter the ACPC every year by just running our 2-player algorithm on the 3-player game, and it produces strategies that win the competition... but we have no real justification for why they appear to be strong.

graphnerd - for any two-player zero-sum game, there is either exactly one Nash equilibrium, or an infinite number of them. The reason is that if you have any two equilibria (say, A and B), then any mixture between them (0.2*A + 0.8*B, 0.5*A + 0.5*B, etc) is also an equilibrium. So if you have two, then you can produce an infinite number of them.

The test that we do to see how close we are isn't a statistical test - it's exact. We brute-force compute the perfect counter-strategy that beats a strategy (this is way easier than solving the game), and then measure exactly how much the counter-strategy beats the strategy for. If the strategy loses 0, then it's a Nash equilibrium; otherwise, it's an epsilon-Nash equilibrium, where epsilon is how much it can be beaten for. So Cepheus is an epsilon-Nash equilibrium with epsilon=0.000986 big blinds per game.

The argument we used in the paper for "essentially solved" is a little different, and you have it right. It's that if a human spent their entire lifetime playing against it while using the perfect counter-strategy, then they still couldn't have 95% confidence at the end that they were winning. So Cepheus is essentially indistinguishable from the perfect strategy via actual play.

Of course, instead of spending a lifetime playing against it, a faster approach might be to do what I did, and spend October 2010 writing a program that computes perfect counter-strategies. So it can be distinguished from perfect play, just not by playing against it.

CrowGoat - if it played against itself, it would tie (on average). In fact, Cepheus learns to play by repeatedly playing against itself - that's a good metaphor for how the CFR and CFR+ algorithms work. In the limit, by playing against itself, it converges towards a Nash equilibrium.

anotherpanacea - Financial markets seem tricky - any useful model would be massive. One of the big practical applications of computational game theory lately has been for security games: deciding how to schedule patrols and checkpoints for places like airports, under the assumption that an adversary is going to observe your historical behaviour (i.e., your strategy) and then attack your weakest point. So you want your strategy to be robust, but also unpredictable. This is not just academic - Dr. Milind Tambe's group at USC designed a system for this that solves large security games and is used at Los Angeles' LAX, coast guard patrols in Boston, and other places too.

The algorithms that we use for poker don't quite apply to the type of game that they use to model their task (yet, at least), but now that we've made a big jump forward in the size of game that can be efficiently solved, we're looking for other applications.

But there *has* been some recent exciting work by my supervisor and another student (and good friend of mine) that uses the algorithm that we use for poker, CFR, and applies it for single-player problems where you don't fully understand the environment you're in. The problem that Chen and Bowling are looking at is a medical application: designing initial insulin treatment policies for new diabetes patients. There is no explicit "adversary" in this setting, but there are many initially unknown variables: how the patient's body will react to insulin, food, exercise, and so on. The goal is to compute a "strategy" for dosing insulin, that is robust against different harmful and dangerous ways that those unknown variables could be set, while still providing a good quality of life for "average" patients. Which winds up being surprisingly similar to playing poker and trying to be robust against any adversary you might face, while still achieving good winnings against weaker adversaries. This is early research (although published and reviewed) and still very much in the lab, but I'm much more excited about extending the work out to real-world problems like this than I am about trying to tackle a second variant of poker.

posted by Fully Completely at 6:55 PM on January 10, 2015 [27 favorites]

Potomac Avenue - heads-up games are all solvable in theory, but the practice is likely a ways off - at least, to outright solve the game. With an earlier program, Polaris, we narrowly lost to human pros in 2007 and narrowly won in 2008. But in 2011, we were able to measure how close Polaris was to perfect play, and discovered that it was nowhere near. Even though it won against some of the world's best HULHE specialists, it was still beatable itself for 0.235 big blinds per game - to poker players, a laughably huge number. Always folding would lose 0.75 big blinds per game.

So while Stud and Omaha and no-limit hold'em are much larger than limit hold'em, and so likely won't be solved for a long time (short of a huge advance in game solving algorithms), there'll be superhuman programs before that. I have no idea about when to predict for that, though.

3-player introduces deep theoretical challenges, since the whole equilibrium approximation approach that we use doesn't really apply anymore. Even if you could compute one, there would be no guarantee that it would be effective or not lose. Nonetheless, we do enter the ACPC every year by just running our 2-player algorithm on the 3-player game, and it produces strategies that win the competition... but we have no real justification for why they appear to be strong.

graphnerd - for any two-player zero-sum game, there is either exactly one Nash equilibrium, or an infinite number of them. The reason is that if you have any two equilibria (say, A and B), then any mixture between them (0.2*A + 0.8*B, 0.5*A + 0.5*B, etc) is also an equilibrium. So if you have two, then you can produce an infinite number of them.

The test that we do to see how close we are isn't a statistical test - it's exact. We brute-force compute the perfect counter-strategy that beats a strategy (this is way easier than solving the game), and then measure exactly how much the counter-strategy beats the strategy for. If the strategy loses 0, then it's a Nash equilibrium; otherwise, it's an epsilon-Nash equilibrium, where epsilon is how much it can be beaten for. So Cepheus is an epsilon-Nash equilibrium with epsilon=0.000986 big blinds per game.

The argument we used in the paper for "essentially solved" is a little different, and you have it right. It's that if a human spent their entire lifetime playing against it while using the perfect counter-strategy, then they still couldn't have 95% confidence at the end that they were winning. So Cepheus is essentially indistinguishable from the perfect strategy via actual play.

Of course, instead of spending a lifetime playing against it, a faster approach might be to do what I did, and spend October 2010 writing a program that computes perfect counter-strategies. So it can be distinguished from perfect play, just not by playing against it.

CrowGoat - if it played against itself, it would tie (on average). In fact, Cepheus learns to play by repeatedly playing against itself - that's a good metaphor for how the CFR and CFR+ algorithms work. In the limit, by playing against itself, it converges towards a Nash equilibrium.

anotherpanacea - Financial markets seem tricky - any useful model would be massive. One of the big practical applications of computational game theory lately has been for security games: deciding how to schedule patrols and checkpoints for places like airports, under the assumption that an adversary is going to observe your historical behaviour (i.e., your strategy) and then attack your weakest point. So you want your strategy to be robust, but also unpredictable. This is not just academic - Dr. Milind Tambe's group at USC designed a system for this that solves large security games and is used at Los Angeles' LAX, coast guard patrols in Boston, and other places too.

The algorithms that we use for poker don't quite apply to the type of game that they use to model their task (yet, at least), but now that we've made a big jump forward in the size of game that can be efficiently solved, we're looking for other applications.

But there *has* been some recent exciting work by my supervisor and another student (and good friend of mine) that uses the algorithm that we use for poker, CFR, and applies it for single-player problems where you don't fully understand the environment you're in. The problem that Chen and Bowling are looking at is a medical application: designing initial insulin treatment policies for new diabetes patients. There is no explicit "adversary" in this setting, but there are many initially unknown variables: how the patient's body will react to insulin, food, exercise, and so on. The goal is to compute a "strategy" for dosing insulin, that is robust against different harmful and dangerous ways that those unknown variables could be set, while still providing a good quality of life for "average" patients. Which winds up being surprisingly similar to playing poker and trying to be robust against any adversary you might face, while still achieving good winnings against weaker adversaries. This is early research (although published and reviewed) and still very much in the lab, but I'm much more excited about extending the work out to real-world problems like this than I am about trying to tackle a second variant of poker.

posted by Fully Completely at 6:55 PM on January 10, 2015 [27 favorites]

*This is early research (although published and reviewed) and still very much in the lab, but I'm much more excited about extending the work out to real-world problems like this than I am about trying to tackle a second variant of poker.*

That is cool, and what I was hoping might be the case - that the lessons here would have larger applications, poker just being a good model to start with for working with imperfect information.

posted by nubs at 7:25 PM on January 10, 2015

ArXiv.org links?

posted by jeffburdges at 8:32 PM on January 10, 2015

posted by jeffburdges at 8:32 PM on January 10, 2015

*Pretty much no one is playing HU limit holdem online...*

My exboyfriend made his living as a heads up limit specialist the entire time we lived together, although it's possible that he's since moved on to other variants.

posted by Jacqueline at 8:53 PM on January 10, 2015 [3 favorites]

This is really cool, but I'm really worried about this person with a terrible gambling problem from the paper:

posted by double block and bleed at 8:55 PM on January 10, 2015 [3 favorites]

...playing 200 games of poker an hour for 12 hours a day without missing a day for 70 years.

posted by double block and bleed at 8:55 PM on January 10, 2015 [3 favorites]

jeffburdges - here's a copy on my website:

http://webdocs.cs.ualberta.ca/~johanson/publications/poker/2015-science-hulhe/2015-science-hulhe.html

posted by Fully Completely at 9:37 PM on January 10, 2015 [1 favorite]

http://webdocs.cs.ualberta.ca/~johanson/publications/poker/2015-science-hulhe/2015-science-hulhe.html

posted by Fully Completely at 9:37 PM on January 10, 2015 [1 favorite]

From the main link:

Presumably the next step is to arm the poker-playing robot with lasers for the event of an unfair game.

This is very cool news!

posted by daisyk at 2:03 AM on January 11, 2015

*A new computer algorithm can play one of the most popular variants of poker essentially perfectly. Its creators say that it is virtually “incapable of losing against any opponent in a fair game”.*Presumably the next step is to arm the poker-playing robot with lasers for the event of an unfair game.

This is very cool news!

posted by daisyk at 2:03 AM on January 11, 2015

Fully Complete, thanks so much for putting the work in to explain all this to us non-experts. My colleague sends along this request for clarification.

Thanks so much for taking the time to explain what is going on here. I remain confused on one point - and this may very well be my own ignorance of relevant results; as this is not really my area. You say "From a game theory perspective, the history is only really relevant if you're trying to learn what the other player's strategy is and what flaws it has, or maybe if their strategy is changing over time and you're trying to predict where they're moving towards.posted by anotherpanacea at 5:49 AM on January 11, 2015

Nash proved that at least one equilibrium exists for every finite game. Are your colleagues claiming that since there could be an infinite history, that it isn't a finite game?"

That's certainly the relevance I had in mind - adding that if one has evidence regarding the opponents strategy, that entails evidence about what cards they are holding in a given hand, given their betting so far in that hand. And while I'm not claiming anything, my question is how you prove that it isn't an infinite game. If history gives one substantive information about the likelihood of a strategy, and hence the likelihood of a given hand at various points, it is prima facie part of the input to a decision. Since there are infinitely many histories - there can't be an infinite history, but there is no finite limit on the size of histories - the total relevance of history would show that the strategies are not finite. So I gather that you have a proof either that there is a strategy ignoring history that succeeds against any strategy that takes it into account, or that there is some finite cap on the amount of history that must be taken into account. That's the part I'm just wondering about. How - in outline - does that go? (It *seems* as if you are just assuming that history is irrelevant by defining "strategy" simply in terms of card/pot position.)

Of course I don't presume that you have time to explain this to me, but in case you do, I am grateful for your help.

These are the kind of comments I want to read… super interesting thanks.

As a bad player of the last big perfect information game standing (go) its interesting to see how games are developing.

Go AI has had one big advance (monte-carlo algorithms) in the last 10 years and that has pushed go computers from mediocre dan level to the level of a top amateur player. (but still far behind the pros). For go - monte carlo was a huge advance - it allowed go AI to advance hugely for a couple of years and make steady progress for the next several years - the consensus amongst go programmers seems to be that it will take one or two other big theoretical breakthroughs or discipline spanning moves to beat the best human players at go though. It seems like your advance was just refining existing approaches in an interesting way, which is interesting still!

posted by Another Fine Product From The Nonsense Factory at 6:46 AM on January 11, 2015

As a bad player of the last big perfect information game standing (go) its interesting to see how games are developing.

Go AI has had one big advance (monte-carlo algorithms) in the last 10 years and that has pushed go computers from mediocre dan level to the level of a top amateur player. (but still far behind the pros). For go - monte carlo was a huge advance - it allowed go AI to advance hugely for a couple of years and make steady progress for the next several years - the consensus amongst go programmers seems to be that it will take one or two other big theoretical breakthroughs or discipline spanning moves to beat the best human players at go though. It seems like your advance was just refining existing approaches in an interesting way, which is interesting still!

posted by Another Fine Product From The Nonsense Factory at 6:46 AM on January 11, 2015

*Yep, everyone I know who plays poker plays Heads Up Limit Texas Hold 'Em.*

Andy Beal is NOT a poker player!

posted by PeterMcDermott at 7:41 AM on January 11, 2015

Heh, yeah I realize it gets played sometimes. I was just poking fun at the wording in the Nature article.

I'm pretty sure there were some micro/low stakes bots running around back when Americans could play on the big poker sites. I don't know how successful they were though. I should do some digging over at 2+2 to see what the story was.

posted by ODiV at 10:05 AM on January 11, 2015

I'm pretty sure there were some micro/low stakes bots running around back when Americans could play on the big poker sites. I don't know how successful they were though. I should do some digging over at 2+2 to see what the story was.

posted by ODiV at 10:05 AM on January 11, 2015

anotherpanacea:

posted by ltl at 1:34 PM on January 11, 2015

That's certainly the relevance I had in mind - adding that if one has evidence regarding the opponents strategy, that entails evidence about what cards they are holding in a given hand, given their betting so far in that hand. And while I'm not claiming anything, my question is how you prove that it isn't an infinite game. If history gives one substantive information about the likelihood of a strategy, and hence the likelihood of a given hand at various points, it is prima facie part of the input to a decision. Since there are infinitely many histories - there can't be an infinite history, but there is no finite limit on the size of histories - the total relevance of history would show that the strategies are not finite.Taking a stab at this: For a human player, it makes sense to look at the opponent's behavior across multiple games to try and find patterns and weaknesses. Here, the computed strategy is supposed and designed to play defensively against all possible strategies: Even against an opponent that plays perfectly, it has to ensure that it will only lose because the opponent was lucky and not because of the opponent's strategy. If it can do that in the first game, it can just use the same strategy for the second game and so on. So, the history of previous games becomes irrelevant. If the other player happens to have a poor strategy, the computed strategy will fare as least as good as against the best strategy. However, it might not be able to exploit the opponent's weaknesses to win as much as possible. For that, taking the history of previous games into account would probably help.

posted by ltl at 1:34 PM on January 11, 2015

I think that's right, ltl. But it's not going to persuade my colleague; he wants something like a proof that the empirical/computational work that Fully Complete and his co-authors have done is actually solving HULHE the way people play it and not a related game that treats hands in isolation. I think he's misunderstanding the role of history in strategy, but I can't figure out how to explain that in a way he'll accept.

posted by anotherpanacea at 11:13 AM on January 12, 2015

posted by anotherpanacea at 11:13 AM on January 12, 2015

anotherpanacea - I typed up what started off as a shortish explanation that grew into a way too long one that I won't post. Being able to solve a game like we have, and how it doesn't depend on the history, is just well-established textbook stuff at this point. We didn't have to formalize that in order to write our paper - von Neumann and Nash proved those parts back in the 50s, and it's entirely non-controvercial now.

So I'll try a shorter explanation, starting with "solved". The term is actually a bit technical, and isn't slang for "really good" or "better than the best humans". Instead, "solve a game" is used in the same sense that we might say "solve a mathematical equation", such as in algebra if we are "solving for X". In this case, the X that we solve for is a pair of strategies playing the game, where each strategy is playing perfectly (with full knowledge of their opponent's strategy) so as to maximize their expected value.

Does your colleague realize that Cepheus is playing by assuming that it's opponent is playing perfectly, and so is only is trying to play perfect defence against perfect opponents? In games like poker where the players alternate positions in between games, to cancel out the positional advantage (in HULHE, the small blind has a 0.088 big blind per game advantage), that guarantees that the strategy will do no worse than tie against anyone. It'll tie against other people that can already play perfectly or that can figure out a good counter-strategy (even if that counter-strategy is itself beatable), and will win against opponents that make mistakes. It won't

That's why the history isn't relevant. Is it relevant for the opponent? If we have a Nash equilibrium strategy, we can tell them exactly what our strategy is: the probability of taking each action, our range of hands at every decision point, or anything else they want. And our strategy won't change over time, and the opponent *still* can't do better than tie. So if we're generous, instead of considering the history, we'll just give them our entire strategy. And if we're not generous, we'll let them try to figure it out, but they still can't win: the most they could hope to learn would be to capture that same strategy that we could have given them.

And it's not relevant for us, either. If we've computed a Nash equilibrium for a 2-player zero-sum game, then we don't need to pay attention to the opponent *at all*: whenever it's our turn, we just look up in our strategy what action probabilities we have, and then randomly sample one. If we were trying to exploit the opponent's flaws we might care about the history, but we're only playing defence. In fact, even if we *did* see a flaw show up in the history, then it might just be a well-placed trap by a perfect adversary, and in trying to exploit it we would only make ourselves lose.

So - the central goal of poker is to try to win as much money as possible from the opponent being faced, and Cepheus is not maximally doing that: solving a game only requires playing perfectly in the face of perfect opposition, and Cepheus will do no worse than tie against anyone. So by that measure, yep - human experts, or other computer strategies, might win more against bad human players than Cepheus will. But that wasn't the goal - we've attacked that problem with other branches of our research. The goal here of solving the game was just to develop a program that can't be beaten by anyone, even if we give you the code, the strategy, an infinite history of games, as much computational power as you need, and any other knowledge you might want, other than knowing what cards Cepheus is currently holding and the state of its random number generator.

posted by Fully Completely at 2:25 PM on January 12, 2015 [4 favorites]

So I'll try a shorter explanation, starting with "solved". The term is actually a bit technical, and isn't slang for "really good" or "better than the best humans". Instead, "solve a game" is used in the same sense that we might say "solve a mathematical equation", such as in algebra if we are "solving for X". In this case, the X that we solve for is a pair of strategies playing the game, where each strategy is playing perfectly (with full knowledge of their opponent's strategy) so as to maximize their expected value.

Does your colleague realize that Cepheus is playing by assuming that it's opponent is playing perfectly, and so is only is trying to play perfect defence against perfect opponents? In games like poker where the players alternate positions in between games, to cancel out the positional advantage (in HULHE, the small blind has a 0.088 big blind per game advantage), that guarantees that the strategy will do no worse than tie against anyone. It'll tie against other people that can already play perfectly or that can figure out a good counter-strategy (even if that counter-strategy is itself beatable), and will win against opponents that make mistakes. It won't

*maximally crush*those flawed opponents, since it isn't actively trying to learn about and exploit the opponent's errors. In fact, we've demonstrated in our earlier work that in order to exploit an opponent's flaws, you almost always have to open up an exploitable hole in your own strategy. Offence and defence are a tradeoff.That's why the history isn't relevant. Is it relevant for the opponent? If we have a Nash equilibrium strategy, we can tell them exactly what our strategy is: the probability of taking each action, our range of hands at every decision point, or anything else they want. And our strategy won't change over time, and the opponent *still* can't do better than tie. So if we're generous, instead of considering the history, we'll just give them our entire strategy. And if we're not generous, we'll let them try to figure it out, but they still can't win: the most they could hope to learn would be to capture that same strategy that we could have given them.

And it's not relevant for us, either. If we've computed a Nash equilibrium for a 2-player zero-sum game, then we don't need to pay attention to the opponent *at all*: whenever it's our turn, we just look up in our strategy what action probabilities we have, and then randomly sample one. If we were trying to exploit the opponent's flaws we might care about the history, but we're only playing defence. In fact, even if we *did* see a flaw show up in the history, then it might just be a well-placed trap by a perfect adversary, and in trying to exploit it we would only make ourselves lose.

So - the central goal of poker is to try to win as much money as possible from the opponent being faced, and Cepheus is not maximally doing that: solving a game only requires playing perfectly in the face of perfect opposition, and Cepheus will do no worse than tie against anyone. So by that measure, yep - human experts, or other computer strategies, might win more against bad human players than Cepheus will. But that wasn't the goal - we've attacked that problem with other branches of our research. The goal here of solving the game was just to develop a program that can't be beaten by anyone, even if we give you the code, the strategy, an infinite history of games, as much computational power as you need, and any other knowledge you might want, other than knowing what cards Cepheus is currently holding and the state of its random number generator.

posted by Fully Completely at 2:25 PM on January 12, 2015 [4 favorites]

If this is the shortish explanation.... well, thanks. I really appreciate this.

posted by anotherpanacea at 4:46 AM on January 13, 2015

posted by anotherpanacea at 4:46 AM on January 13, 2015

Heads-up poker is about as interesting to me as Tic Tac Toe. I can't believe they bother airing it on TV.

It is the dynamics of a group that makes it interesting to watch, imho.

posted by Theta States at 10:16 AM on January 13, 2015

It is the dynamics of a group that makes it interesting to watch, imho.

posted by Theta States at 10:16 AM on January 13, 2015

Heads-up poker can be very satisfying to play... if you're winning. :) Watching it is boring.

posted by jeffamaphone at 10:24 AM on January 13, 2015

posted by jeffamaphone at 10:24 AM on January 13, 2015

« Older BBC Mini-Documentary on South Dakota's Underground... | A little Willie and more on a Saturday night Newer »

This thread has been archived and is closed to new comments

Saying poker has been solved based on this tiny subset of a situation is the same as all the terrible reporting on cars mastering self-driving despite not being able to drive at night, in the rain, past a driveway, in heavy traffic.

posted by Cosine at 1:41 PM on January 10, 2015 [14 favorites]