Table of Contents
Fetching ...

Look-ahead Reasoning with a Learned Model in Imperfect Information Games

Ondřej Kubíček, Viliam Lisý

TL;DR

This work tackles the lack of explicit environment models in imperfect information games by introducing LAMIR, a MuZero-inspired framework that learns a compact abstract model directly from interaction to enable test-time look-ahead planning. LAMIR comprises representations $\Lambda^{I}_\theta$, dynamics $\Upsilon_\theta$, and legal-action predictors $\Gamma_\theta$, trained from full trajectories with a combined loss $\mathcal{L}_\theta^{M}$. It further introduces a domain-independent abstraction via clustering of information sets within each public state using a learned clustering function $\kappa$, governed by losses $\mathcal{L}_\theta^{A}$ and $\mathcal{L}_\theta^{S}$, and decouples abstraction learning from dynamics. At test time, LAMIR performs depth-limited look-ahead with CFR+ on the abstract subgames, leveraging continual resolving and an approximate value function $v_\theta$ to hedge beyond the horizon. Empirical results show that, with sufficient capacity, LAMIR recovers the underlying game structure, and even with constrained abstraction, it substantially improves performance over the RNaD baseline, achieving up to 80% win rate in large-scale games.

Abstract

Test-time reasoning significantly enhances pre-trained AI agents' performance. However, it requires an explicit environment model, often unavailable or overly complex in real-world scenarios. While MuZero enables effective model learning for search in perfect information games, extending this paradigm to imperfect information games presents substantial challenges due to more nuanced look-ahead reasoning techniques and large number of states relevant for individual decisions. This paper introduces an algorithm LAMIR that learns an abstracted model of an imperfect information game directly from the agent-environment interaction. During test time, this trained model is used to perform look-ahead reasoning. The learned abstraction limits the size of each subgame to a manageable size, making theoretically principled look-ahead reasoning tractable even in games where previous methods could not scale. We empirically demonstrate that with sufficient capacity, LAMIR learns the exact underlying game structure, and with limited capacity, it still learns a valuable abstraction, which improves game playing performance of the pre-trained agents even in large games.

Look-ahead Reasoning with a Learned Model in Imperfect Information Games

TL;DR

This work tackles the lack of explicit environment models in imperfect information games by introducing LAMIR, a MuZero-inspired framework that learns a compact abstract model directly from interaction to enable test-time look-ahead planning. LAMIR comprises representations , dynamics , and legal-action predictors , trained from full trajectories with a combined loss . It further introduces a domain-independent abstraction via clustering of information sets within each public state using a learned clustering function , governed by losses and , and decouples abstraction learning from dynamics. At test time, LAMIR performs depth-limited look-ahead with CFR+ on the abstract subgames, leveraging continual resolving and an approximate value function to hedge beyond the horizon. Empirical results show that, with sufficient capacity, LAMIR recovers the underlying game structure, and even with constrained abstraction, it substantially improves performance over the RNaD baseline, achieving up to 80% win rate in large-scale games.

Abstract

Test-time reasoning significantly enhances pre-trained AI agents' performance. However, it requires an explicit environment model, often unavailable or overly complex in real-world scenarios. While MuZero enables effective model learning for search in perfect information games, extending this paradigm to imperfect information games presents substantial challenges due to more nuanced look-ahead reasoning techniques and large number of states relevant for individual decisions. This paper introduces an algorithm LAMIR that learns an abstracted model of an imperfect information game directly from the agent-environment interaction. During test time, this trained model is used to perform look-ahead reasoning. The learned abstraction limits the size of each subgame to a manageable size, making theoretically principled look-ahead reasoning tractable even in games where previous methods could not scale. We empirically demonstrate that with sufficient capacity, LAMIR learns the exact underlying game structure, and with limited capacity, it still learns a valuable abstraction, which improves game playing performance of the pre-trained agents even in large games.

Paper Structure

This paper contains 33 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The public state and information set representations functions of player $i$. First the $\Lambda_{i, \theta}$ predicts 4 abstract information sets and then $\Lambda^{I}_{i, \theta}$ predicts the probability distribution over those abstractions.
  • Figure 2: Exploitability of LAMIR in a different games by using continual resolving with depth-limit 1 in each subgame with different choice of abstraction limit $L$ and $\kappa$. The largest public state in II Goofspiel 5 contains 30 infosets and in Oshi-Zumo 3,5 it contains 625 information sets.
  • Figure 3: Exploitability in different games when performing tabular K-means to get abstraction using different property functions $\kappa$.
  • Figure 4: Exploitability of different LAMIR runs with different $\kappa$ either by mapping the information abstraction onto the original game tree, or by constructing the whole game tree from the dynamics network or by using the LAMIR with depth-limit 1
  • Figure 5: Exploitability in Leduc Hold'em either by constructing the subgame from the rules of the game, or from the dynamics network