Adapting Beyond the Depth Limit: Counter Strategies in Large Imperfect Information Games
David Milec, Vojtěch Kovařík, Viliam Lisý
TL;DR
This work tackles robust adaptation in large, two-player zero-sum imperfect-information games by enabling exploitation of opponent mistakes beyond the depth limit without sacrificing safety against rational play. It introduces Adapting Beyond Depth-limit (ABD), which uses matrix-valued states and a portfolio of strategies to simulate the sub-rational opponent’s behavior beyond the lookahead horizon. ABD can compute opponent-specific values with sampling instead of neural networks, and provides theoretical guarantees that, with portfolios containing all pure undominated strategies, it converges to a p-restricted Nash response in the idealized setting. Empirically, ABD substantially outperforms prior methods like Continual Depth-limited Restricted Nash Response (CDRNR), especially against opponents whose suboptimal behavior manifests late in the game, and demonstrates strong performance in Battleships and Leduc Hold’em, including a large-scale 5x5 Battleships scenario.
Abstract
We study the problem of adapting to a known sub-rational opponent during online play while remaining robust to rational opponents. We focus on large imperfect-information (zero-sum) games, which makes it impossible to inspect the whole game tree at once and necessitates the use of depth-limited search. However, all existing methods assume rational play beyond the depth-limit, which only allows them to adapt a very limited portion of the opponent's behaviour. We propose an algorithm Adapting Beyond Depth-limit (ABD) that uses a strategy-portfolio approach - which we refer to as matrix-valued states - for depth-limited search. This allows the algorithm to fully utilise all information about the opponent model, making it the first robust-adaptation method to be able to do so in large imperfect-information games. As an additional benefit, the use of matrix-valued states makes the algorithm simpler than traditional methods based on optimal value functions. Our experimental results in poker and battleship show that ABD yields more than a twofold increase in utility when facing opponents who make mistakes beyond the depth limit and also delivers significant improvements in utility and safety against randomly generated opponents.
