Table of Contents
Fetching ...

DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling

TL;DR

DeepStack addresses the challenge of expert-level play in imperfect-information games, specifically heads-up no-limit Texas Hold'em, by integrating recursive CFR-based reasoning with continual re-solving and a learned counterfactual-value function to cap depth. It replaces offline full-game abstraction with online, situation-specific solving, aided by deep neural networks (flop, turn, and auxiliary) that estimate subgame values. The approach yields near-Nash-equilibrium strategies, demonstrated by statistically significant superiority over professional players and strong resistance to exploitation as shown by Local Best Response analyses. This work signals a paradigm shift in handling large, sequential imperfect-information problems by coupling online solving with learned value approximations, enabling practical play and broader applicability.

Abstract

Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated with statistical significance professional poker players in heads-up no-limit Texas hold'em. The approach is theoretically sound and is shown to produce more difficult to exploit strategies than prior approaches.

DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

TL;DR

DeepStack addresses the challenge of expert-level play in imperfect-information games, specifically heads-up no-limit Texas Hold'em, by integrating recursive CFR-based reasoning with continual re-solving and a learned counterfactual-value function to cap depth. It replaces offline full-game abstraction with online, situation-specific solving, aided by deep neural networks (flop, turn, and auxiliary) that estimate subgame values. The approach yields near-Nash-equilibrium strategies, demonstrated by statistically significant superiority over professional players and strong resistance to exploitation as shown by Local Best Response analyses. This work signals a paradigm shift in handling large, sequential imperfect-information problems by coupling online solving with learned value approximations, enabling practical play and broader applicability.

Abstract

Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated with statistical significance professional poker players in heads-up no-limit Texas hold'em. The approach is theoretically sound and is shown to produce more difficult to exploit strategies than prior approaches.

Paper Structure

This paper contains 2 sections, 9 theorems, 13 equations, 6 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

If the values returned by the value function used when the depth limit is reached have error less than $\epsilon$, and $T$ iterations of CFR are used to re-solve, then the resulting strategy's exploitability is less than $k_1\epsilon + k_2 / \sqrt{T}$, where $k_1$ and $k_2$ are game-specific constan

Figures (6)

  • Figure 1: A portion of the public tree in HUNL. Nodes represent public states, whereas edges represent actions: red and turquoise showing player betting actions, and green representing public cards revealed by chance. The game ends at terminal nodes, shown as a chip with an associated value. For terminal nodes where no player folded, the player whose private cards form a stronger poker hand receives the value of the state.
  • Figure 2: DeepStack overview.(A) DeepStack reasons in the public tree always producing action probabilities for all cards it can hold in a public state. It maintains two vectors while it plays: its own range and its opponent's counterfactual values. As the game proceeds, its own range is updated via Bayes' rule using its computed action probabilities after it takes an action. Opponent counterfactual values are updated as discussed under "Continual re-solving". To compute action probabilities when it must act, it performs a re-solve using its range and the opponent counterfactual values. To make the re-solve tractable it restricts the available actions of the players and lookahead is limited to the end of the round. During the re-solve, counterfactual values for public states beyond its lookahead are approximated using DeepStack's learned evaluation function. (B) The evaluation function is represented with a neural network that takes the public state and ranges from the current iteration as input and outputs counterfactual values for both players (Fig. \ref{['fig:dnn']}). (C) The neural network is trained prior to play by generating random poker situations (pot size, board cards, and ranges) and solving them to produce training examples. Complete pseudocode can be found in Algorithm S1 SOM.
  • Figure 3: Deep counterfactual value network. The inputs to the network are the pot size, public cards, and the player ranges, which are first processed into hand clusters. The output from the seven fully connected hidden layers is post-processed to guarantee the values satisfy the zero-sum constraint, and then mapped back into a vector of counterfactual values.
  • Figure 4: Performance of professional poker players against DeepStack. Performance estimated with AIVAT along with a 95% confidence interval. The solid bars at the bottom show the number of games the participant completed.
  • Figure 5: Huber loss with different numbers of hidden layers in the neural network.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Theorem 2