Table of Contents
Fetching ...

Approximate State Abstraction for Markov Games

Hiroki Ishibashi, Kenshi Abe, Atsushi Iwasaki

TL;DR

This work tackles state-space explosion in two-player zero-sum Markov games by introducing approximate state abstraction based on minimax values, extending near-state abstraction from single-agent MDPs to TZMGs. It defines an abstract TZMG via state aggregation with a weight function and derives bounds on the duality gap between equilibria in the abstract and ground games, showing that ground-equilibrium performance is preserved up to a factor of ε and γ when aggregation is performed on minimax values. The authors prove key results for the minimax-based aggregation φ^{Q*}, deriving a GAP bound of GAP ≤ 12ε/(1−γ)^3, and validate the approach experimentally on Markov Soccer, demonstrating substantial state-space reduction and near-equilibrium behavior for modest ε while highlighting limitations at larger ε due to potential deadlocks. The paper also outlines extensions to additional similarity criteria (Model, Boltzmann, Multinomial) with corresponding bounds, and discusses future work on larger-scale games and cross-domain transferability of abstractions.

Abstract

This paper introduces state abstraction for two-player zero-sum Markov games (TZMGs), where the payoffs for the two players are determined by the state representing the environment and their respective actions, with state transitions following Markov decision processes. For example, in games like soccer, the value of actions changes according to the state of play, and thus such games should be described as Markov games. In TZMGs, as the number of states increases, computing equilibria becomes more difficult. Therefore, we consider state abstraction, which reduces the number of states by treating multiple different states as a single state. There is a substantial body of research on finding optimal policies for Markov decision processes using state abstraction. However, in the multi-player setting, the game with state abstraction may yield different equilibrium solutions from those of the ground game. To evaluate the equilibrium solutions of the game with state abstraction, we derived bounds on the duality gap, which represents the distance from the equilibrium solutions of the ground game. Finally, we demonstrate our state abstraction with Markov Soccer, compute equilibrium policies, and examine the results.

Approximate State Abstraction for Markov Games

TL;DR

This work tackles state-space explosion in two-player zero-sum Markov games by introducing approximate state abstraction based on minimax values, extending near-state abstraction from single-agent MDPs to TZMGs. It defines an abstract TZMG via state aggregation with a weight function and derives bounds on the duality gap between equilibria in the abstract and ground games, showing that ground-equilibrium performance is preserved up to a factor of ε and γ when aggregation is performed on minimax values. The authors prove key results for the minimax-based aggregation φ^{Q*}, deriving a GAP bound of GAP ≤ 12ε/(1−γ)^3, and validate the approach experimentally on Markov Soccer, demonstrating substantial state-space reduction and near-equilibrium behavior for modest ε while highlighting limitations at larger ε due to potential deadlocks. The paper also outlines extensions to additional similarity criteria (Model, Boltzmann, Multinomial) with corresponding bounds, and discusses future work on larger-scale games and cross-domain transferability of abstractions.

Abstract

This paper introduces state abstraction for two-player zero-sum Markov games (TZMGs), where the payoffs for the two players are determined by the state representing the environment and their respective actions, with state transitions following Markov decision processes. For example, in games like soccer, the value of actions changes according to the state of play, and thus such games should be described as Markov games. In TZMGs, as the number of states increases, computing equilibria becomes more difficult. Therefore, we consider state abstraction, which reduces the number of states by treating multiple different states as a single state. There is a substantial body of research on finding optimal policies for Markov decision processes using state abstraction. However, in the multi-player setting, the game with state abstraction may yield different equilibrium solutions from those of the ground game. To evaluate the equilibrium solutions of the game with state abstraction, we derived bounds on the duality gap, which represents the distance from the equilibrium solutions of the ground game. Finally, we demonstrate our state abstraction with Markov Soccer, compute equilibrium policies, and examine the results.

Paper Structure

This paper contains 30 sections, 9 theorems, 85 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

When the ground states are aggregated by the apggregation function $\phi^{Q^{\ast}}$ satisfying Assumption asmp:minimax_Q_abstraction with $\epsilon \geq 0$, then $\boldsymbol{\pi}_{GA}^{\ast}$ satisfies:

Figures (3)

  • Figure 1: An initial state of the Markov soccer game in which player 1 has the ball.
  • Figure 2: Number of states in the abstract Markov soccer games with respect to $\epsilon$.
  • Figure 3: Duality gap at each iteration in minimax Q-learning. Note that the policies are trained in the abstract game, and their duality gap values are computed in the ground game.

Theorems & Definitions (15)

  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • proof : Proof of Lemma \ref{['lem:exploitability_bound_minimax_value']}
  • Lemma 4
  • proof : Proof of Lemma \ref{['lem:v_to_q']}
  • ...and 5 more