Adapting Beyond the Depth Limit: Counter Strategies in Large Imperfect Information Games

David Milec; Vojtěch Kovařík; Viliam Lisý

Adapting Beyond the Depth Limit: Counter Strategies in Large Imperfect Information Games

David Milec, Vojtěch Kovařík, Viliam Lisý

TL;DR

This work tackles robust adaptation in large, two-player zero-sum imperfect-information games by enabling exploitation of opponent mistakes beyond the depth limit without sacrificing safety against rational play. It introduces Adapting Beyond Depth-limit (ABD), which uses matrix-valued states and a portfolio of strategies to simulate the sub-rational opponent’s behavior beyond the lookahead horizon. ABD can compute opponent-specific values with sampling instead of neural networks, and provides theoretical guarantees that, with portfolios containing all pure undominated strategies, it converges to a p-restricted Nash response in the idealized setting. Empirically, ABD substantially outperforms prior methods like Continual Depth-limited Restricted Nash Response (CDRNR), especially against opponents whose suboptimal behavior manifests late in the game, and demonstrates strong performance in Battleships and Leduc Hold’em, including a large-scale 5x5 Battleships scenario.

Abstract

We study the problem of adapting to a known sub-rational opponent during online play while remaining robust to rational opponents. We focus on large imperfect-information (zero-sum) games, which makes it impossible to inspect the whole game tree at once and necessitates the use of depth-limited search. However, all existing methods assume rational play beyond the depth-limit, which only allows them to adapt a very limited portion of the opponent's behaviour. We propose an algorithm Adapting Beyond Depth-limit (ABD) that uses a strategy-portfolio approach - which we refer to as matrix-valued states - for depth-limited search. This allows the algorithm to fully utilise all information about the opponent model, making it the first robust-adaptation method to be able to do so in large imperfect-information games. As an additional benefit, the use of matrix-valued states makes the algorithm simpler than traditional methods based on optimal value functions. Our experimental results in poker and battleship show that ABD yields more than a twofold increase in utility when facing opponents who make mistakes beyond the depth limit and also delivers significant improvements in utility and safety against randomly generated opponents.

Adapting Beyond the Depth Limit: Counter Strategies in Large Imperfect Information Games

TL;DR

Abstract

Paper Structure (27 sections, 1 theorem, 6 figures, 3 tables)

This paper contains 27 sections, 1 theorem, 6 figures, 3 tables.

Introduction
Background
Robust Adaptation: Restricted Nash Response
Online Play in Large Games: Continual Resolving
Restarting Search Mid-Game: Re-solving Gadget
Depth-Limited Solving via Value Functions
Depth-Limited Solving via Matrix-Valued States
Related Work
Adapting to Opponents in Smaller Games
Adapting to Opponents in Perfect Information Games
Adaptation in Large Imperfect Information Games
Safe Online Adaptation
Continual Depth-limited Responses
Motivation
Opponent Exploitation Beyond the Depth-limit
...and 12 more sections

Key Result

proposition 1

Let ${\mathcal{G}}$ be a two-player zero-sum EFG, $\sigma_2^\textnormal{fix} \in \Sigma_2$ be a fixed strategy of the opponent, and $p \in [0, 1]$. When $\mathbb P_i \supseteq \left\{ s_i \in \Sigma_i \mid s_i \textnormal{ is pure undominated} \right\}$, the strategy produced by the ABD algorithm is

Figures (6)

Figure 1: Top: Calculating restricted Nash response. Left: Starting search mid-play and a re-solving gadget. Right: Depth-limited solving using matrix-valued states.
Figure 2: Setup for \ref{['expl:simplified_battleships']}. Battleships of size 2x2 with one 1x1 ship. Fixed opponent will not fire to the top left corner, and our best response is to place the ship there.
Figure 3: An illustration of the game used by the adapting beyond depth-limit algorithm. The upper part is the gadget, whose presence is necessary when not running the algorithm from the root. Left: fixed opponent. Right: rational opponent.
Figure 4: Example situations in Tic-tac-toe. The adapting player is the circle. Left: Opening with (one of) optimal strategies and reaction with the only optimal move. Middle: Opening with another optimal strategy and reaction, which is losing. Right: Opening by the opponent optimally and the suboptimal reaction of adapting player, knowing the opponent will then lose.
Figure 5: Results of ABD in Battleships against an opponent who does not shoot to the top left corner. ABD coincides with RNR, and CDBR is not able to exploit the strategy with depth = 2
...and 1 more figures

Theorems & Definitions (3)

definition 1: EFG
definition 2: Matrix-valued depth-limited game
proposition 1

Adapting Beyond the Depth Limit: Counter Strategies in Large Imperfect Information Games

TL;DR

Abstract

Adapting Beyond the Depth Limit: Counter Strategies in Large Imperfect Information Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)