Table of Contents
Fetching ...

Computing Perfect Bayesian Equilibria, with Application to Empirical Game-Theoretic Analysis

Christine Konicki, Mithun Chakraborty, Michael P. Wellman

TL;DR

The paper tackles computing Perfect Bayesian Equilibria (PBE) in two-player imperfect-information extensive-form games by introducing PBE-CFR, a scalable Counterfactual Regret Minimization (CFR)–based method that enforces AGM-consistency between strategies and beliefs. It achieves this by computing believed utilities $U^B$ with consistent beliefs and by updating beliefs via UpdateBeliefs using a plausibility order, with theoretical guarantees of polynomial-space/time complexity and convergence to PBE in two-player zero-sum settings. Empirical evaluation on general-sum game classes and a TE-PSRO framework shows PBE-CFR yields high-quality equilibria and improves TE-PSRO model quality compared to unrefined Nash baselines, though performance depends on the game’s information structure. The work demonstrates a practical path to using refined equilibria for strategy exploration in empirical game-theoretic analysis and highlights contexts where PBE provides advantages over NE-based MSSs.

Abstract

Perfect Bayesian Equilibrium (PBE) is a refinement of the Nash equilibrium for imperfect-information extensive-form games (EFGs) that enforces consistency between the two components of a solution: agents' strategy profile describing their decisions at information sets and the belief system quantifying their uncertainty over histories within an information set. We present a scalable approach for computing a PBE of an arbitrary two-player EFG. We adopt the definition of PBE enunciated by Bonanno in 2011 using a consistency concept based on the theory of belief revision due to Alchourrón, Gärdenfors, and Makinson. Our algorithm for finding a PBE is an adaptation of Counterfactual Regret Minimization (CFR) that minimizes the expected regret at each information set given a belief system, while maintaining the necessary consistency criteria. We prove that our algorithm is correct for two-player zero-sum games and has a reasonable slowdown in time-complexity relative to classical CFR given the additional computation needed for refinement. We also experimentally demonstrate the competent performance of PBE-CFR in terms of equilibrium quality and running time on medium-to-large non-zero-sum EFGs. Finally, we investigate the effectiveness of using PBE for strategy exploration in empirical game-theoretic analysis. Specifically, we compute PBE as a meta-strategy solver (MSS) in a tree-exploiting variant of Policy Space Response Oracles (TE-PSRO). Our experiments show that PBE as an MSS leads to higher-quality empirical EFG models with complex imperfect information structures compared to MSSs based on an unrefined Nash equilibrium.

Computing Perfect Bayesian Equilibria, with Application to Empirical Game-Theoretic Analysis

TL;DR

The paper tackles computing Perfect Bayesian Equilibria (PBE) in two-player imperfect-information extensive-form games by introducing PBE-CFR, a scalable Counterfactual Regret Minimization (CFR)–based method that enforces AGM-consistency between strategies and beliefs. It achieves this by computing believed utilities with consistent beliefs and by updating beliefs via UpdateBeliefs using a plausibility order, with theoretical guarantees of polynomial-space/time complexity and convergence to PBE in two-player zero-sum settings. Empirical evaluation on general-sum game classes and a TE-PSRO framework shows PBE-CFR yields high-quality equilibria and improves TE-PSRO model quality compared to unrefined Nash baselines, though performance depends on the game’s information structure. The work demonstrates a practical path to using refined equilibria for strategy exploration in empirical game-theoretic analysis and highlights contexts where PBE provides advantages over NE-based MSSs.

Abstract

Perfect Bayesian Equilibrium (PBE) is a refinement of the Nash equilibrium for imperfect-information extensive-form games (EFGs) that enforces consistency between the two components of a solution: agents' strategy profile describing their decisions at information sets and the belief system quantifying their uncertainty over histories within an information set. We present a scalable approach for computing a PBE of an arbitrary two-player EFG. We adopt the definition of PBE enunciated by Bonanno in 2011 using a consistency concept based on the theory of belief revision due to Alchourrón, Gärdenfors, and Makinson. Our algorithm for finding a PBE is an adaptation of Counterfactual Regret Minimization (CFR) that minimizes the expected regret at each information set given a belief system, while maintaining the necessary consistency criteria. We prove that our algorithm is correct for two-player zero-sum games and has a reasonable slowdown in time-complexity relative to classical CFR given the additional computation needed for refinement. We also experimentally demonstrate the competent performance of PBE-CFR in terms of equilibrium quality and running time on medium-to-large non-zero-sum EFGs. Finally, we investigate the effectiveness of using PBE for strategy exploration in empirical game-theoretic analysis. Specifically, we compute PBE as a meta-strategy solver (MSS) in a tree-exploiting variant of Policy Space Response Oracles (TE-PSRO). Our experiments show that PBE as an MSS leads to higher-quality empirical EFG models with complex imperfect information structures compared to MSSs based on an unrefined Nash equilibrium.
Paper Structure (33 sections, 6 theorems, 29 equations, 14 figures, 1 table, 9 algorithms)

This paper contains 33 sections, 6 theorems, 29 equations, 14 figures, 1 table, 9 algorithms.

Key Result

theorem 1

The worst-case space and time complexities of PBE-CFR are $O(\lvert H \rvert \cdot \lvert A_{max} \rvert^2)$ and $O(T \cdot \lvert H \rvert \cdot \lvert A_{max} \rvert^2)$ respectively, where $A_{max}$ is the largest action set across all players' information sets.

Figures (14)

  • Figure 1: Example of an imperfect-information EFG from agm11, augmented with leaf utilities. There is one non-singleton information set (for Player $2$) represented by the orange box. The equilibrium path induced by the AGM-consistent assessment $(\bm{\sigma}^*, \mu^*)$ described in Example \ref{['ex:agm_consistency']} is highlighted in green.
  • Figure 2: TE-PSRO Schematic: Empirical game is extensive-form, so PBE may be used as MSS and/or EVAL.
  • Figure 3: Time required by CFR and PBE-CFR for games generated from $\textsc{PrivateGenGoof}_{4}$ (top) and $\textsc{PrivateGenGoof}_{5}$ (bottom) with $T = 1000$.
  • Figure 4: Average regret of $\bm{\sigma}^*$ evaluated in $\textsc{Bargain}$ over the course of TE-PSRO's runtime, using NE or PBE as the MSS.
  • Figure 5: Average regret of $\bm{\sigma}^*$ evaluated in $\textsc{GenGoof}_{4}$ over the course of TE-PSRO's runtime, using NE or PBE as the MSS.
  • ...and 9 more figures

Theorems & Definitions (11)

  • definition 1: Sequential Rationality
  • definition 2: Plausibility Order
  • definition 3: AGM-consistency agm11
  • definition 4: Perfect Bayesian Equilibrium agm11
  • theorem 1
  • definition 5: One-shot deviation
  • theorem 2
  • lemma 1
  • lemma 2
  • lemma 3
  • ...and 1 more