Table of Contents
Fetching ...

Learning in Games with Progressive Hiding

Benjamin Heymann, Marc Lanctot

TL;DR

This work addresses learning in imperfect-information games by relaxing information constraints through an information-relaxation framework called progressive hiding. It combines penalty-based relaxations with no-regret learning to create an auxiliary game in which CFR can be applied even when perfect recall fails, and it proves that, under suitable conditions, the CFR guarantees extend to the auxiliary setting. The main contribution is a formal theorem linking the progressive-hiding auxiliary game to CFR performance, along with a practical algorithm and theoretical guarantees. Empirically, progressive hiding yields notable improvements across several game settings, including Trade Comm, Cooperative Matching Pennies, and Abstracted Tiny Bridge, often outperforming baseline CFR-like methods within limited episode budgets. The approach offers a principled bridge between stochastic programming techniques (scenario decomposition, progressive hedging) and learning in games, with potential extensions to broader competitive, multi-agent, or large-scale settings.

Abstract

When learning to play an imperfect information game, it is often easier to first start with the basic mechanics of the game rules. For example, one can play several example rounds with private cards revealed to all players to better understand the basic actions and their effects. Building on this intuition, this paper introduces {\it progressive hiding}, an algorithm that balances learning the basic mechanics of an imperfect information game and satisfying the information constraints. Progressive hiding is inspired by methods from stochastic multistage optimization, such as scenario decomposition and progressive hedging. We prove that it enables the adaptation of counterfactual regret minimization to games where perfect recall is not satisfied. Numerical experiments illustrate that progressive hiding produces notable improvements in several settings.

Learning in Games with Progressive Hiding

TL;DR

This work addresses learning in imperfect-information games by relaxing information constraints through an information-relaxation framework called progressive hiding. It combines penalty-based relaxations with no-regret learning to create an auxiliary game in which CFR can be applied even when perfect recall fails, and it proves that, under suitable conditions, the CFR guarantees extend to the auxiliary setting. The main contribution is a formal theorem linking the progressive-hiding auxiliary game to CFR performance, along with a practical algorithm and theoretical guarantees. Empirically, progressive hiding yields notable improvements across several game settings, including Trade Comm, Cooperative Matching Pennies, and Abstracted Tiny Bridge, often outperforming baseline CFR-like methods within limited episode budgets. The approach offers a principled bridge between stochastic programming techniques (scenario decomposition, progressive hedging) and learning in games, with potential extensions to broader competitive, multi-agent, or large-scale settings.

Abstract

When learning to play an imperfect information game, it is often easier to first start with the basic mechanics of the game rules. For example, one can play several example rounds with private cards revealed to all players to better understand the basic actions and their effects. Building on this intuition, this paper introduces {\it progressive hiding}, an algorithm that balances learning the basic mechanics of an imperfect information game and satisfying the information constraints. Progressive hiding is inspired by methods from stochastic multistage optimization, such as scenario decomposition and progressive hedging. We prove that it enables the adaptation of counterfactual regret minimization to games where perfect recall is not satisfied. Numerical experiments illustrate that progressive hiding produces notable improvements in several settings.
Paper Structure (26 sections, 6 theorems, 11 equations, 3 figures, 2 algorithms)

This paper contains 26 sections, 6 theorems, 11 equations, 3 figures, 2 algorithms.

Key Result

Proposition 1

$\forall \mu\in\bar{\Lambda}, \mu\in\Lambda\iff \mathrm{Proj}_{\mu_0}(\mu)=\mu$.

Figures (3)

  • Figure 1: Learning outcomes ($\mathbb{E}_{\gamma^{t}}[r_t(\textbf{h})]$) distribution for the three information map baseline, recall and cheated on Trade Comm with parameter $(m,n)=(2,2)$ (left) and $(m,n)=(3,2)$ (right).
  • Figure 2: Tree representation of Cooperative Matching Pennies, introduced in Section \ref{['sec:matching_pennies']}. First a random state is sampled among SAME or DIFFERENT. Alice, the first player, observes the outcome of this random event and then chooses between TAIL and HEAD. Bob, the second player, knows Alice's choice but does not know the state of nature. Bob then makes his decision, choosing either TAIL, HEAD, or PASS. The payoff, indicated in the leaves, is the same for both player.
  • Figure 3: Distribution of the best maximal payoff obtained along the learning for each of the 50 training for Abstracted Tiny Bridge. According to Sokota_Lockhart_Timbers_Davoodi_D’Orazio_Burch_Schmid_Bowling_Lanctot_2021, 20.32 is the performance of the best joint policy that does not requires coordination.

Theorems & Definitions (6)

  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Proposition 3
  • Theorem 2
  • Proposition 4