Table of Contents
Fetching ...

Consistent Opponent Modeling in Imperfect-Information Games

Sam Ganzfried

TL;DR

The paper tackles the problem of consistency in opponent modeling for two-player imperfect-information games where opponents are drawn from a prior. It demonstrates that Bayesian Best Response is not consistent and introduces FMAP, a convex optimization-based algorithm in sequence-form with Dirichlet priors that guarantees consistency under standard identifiability and visitation assumptions. FMAP optimizes the posterior mode via projected gradient descent and, unlike mean-based approaches, provides provable convergence to the true opponent strategy. In Kuhn poker experiments, FMAP outperforms sampling-based methods and closely approaches the full best response, indicating strong practical potential for data-driven exploitation in imperfect-information domains.

Abstract

The goal of agents in multi-agent environments is to maximize total reward against the opposing agents that are encountered. Following a game-theoretic solution concept, such as Nash equilibrium, may obtain a strong performance in some settings; however, such approaches fail to capitalize on historical and observed data from repeated interactions against our opponents. Opponent modeling algorithms integrate machine learning techniques to exploit suboptimal opponents utilizing available data; however, the effectiveness of such approaches in imperfect-information games to date is quite limited. We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution; namely, they do not guarantee that the model approaches the opponent's true strategy even in the limit as the number of game iterations approaches infinity. We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex minimization problem based on the sequence-form game representation using projected gradient descent. The algorithm is guaranteed to efficiently converge to the opponent's true strategy under standard Bayesian identifiability and visitation assumptions, given observations from gameplay and possibly additional historical data if it is available.

Consistent Opponent Modeling in Imperfect-Information Games

TL;DR

The paper tackles the problem of consistency in opponent modeling for two-player imperfect-information games where opponents are drawn from a prior. It demonstrates that Bayesian Best Response is not consistent and introduces FMAP, a convex optimization-based algorithm in sequence-form with Dirichlet priors that guarantees consistency under standard identifiability and visitation assumptions. FMAP optimizes the posterior mode via projected gradient descent and, unlike mean-based approaches, provides provable convergence to the true opponent strategy. In Kuhn poker experiments, FMAP outperforms sampling-based methods and closely approaches the full best response, indicating strong practical potential for data-driven exploitation in imperfect-information domains.

Abstract

The goal of agents in multi-agent environments is to maximize total reward against the opposing agents that are encountered. Following a game-theoretic solution concept, such as Nash equilibrium, may obtain a strong performance in some settings; however, such approaches fail to capitalize on historical and observed data from repeated interactions against our opponents. Opponent modeling algorithms integrate machine learning techniques to exploit suboptimal opponents utilizing available data; however, the effectiveness of such approaches in imperfect-information games to date is quite limited. We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution; namely, they do not guarantee that the model approaches the opponent's true strategy even in the limit as the number of game iterations approaches infinity. We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex minimization problem based on the sequence-form game representation using projected gradient descent. The algorithm is guaranteed to efficiently converge to the opponent's true strategy under standard Bayesian identifiability and visitation assumptions, given observations from gameplay and possibly additional historical data if it is available.

Paper Structure

This paper contains 6 sections, 6 theorems, 21 equations, 2 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

That is, the payoff against the mean of a strategy distribution equals the payoff against the full distribution Ganzfried18b:Bayesian.

Figures (2)

  • Figure 1: Game tree for Kuhn poker.
  • Figure 2: Profit as a function of game iteration for several opponent modeling algorithms and benchmark strategies. The results are averaged over 100 opponents generated randomly from the prior distribution. The sampling algorithms all use 10 samples.

Theorems & Definitions (11)

  • Theorem 1
  • Corollary 1
  • Definition 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4: Consistency under standard assumptions
  • ...and 1 more