Table of Contents
Fetching ...

Efficient Last-iterate Convergence Algorithms in Solving Games

Linjian Meng, Youzhi Zhang, Zhenxing Ge, Shangdong Yang, Tianyu Ding, Wenbin Li, Tianpei Yang, Bo An, Yang Gao

TL;DR

This work addresses the challenge of achieving last-iterate convergence when learning Nash equilibria in extensive-form games. It leverages the Reward Transformation (RT) framework to recast NE learning into a sequence of perturbed regularized EFGs and introduces RTCFR^+, a parameter-free RM-based CFR algorithm that solves these perturbed problems using CFR^+. The authors prove both non-parameter-free and parameter-free last-iterate convergence for CFR^+ in solving perturbed regularized EFGs, enabling end-to-end last-iterate convergence for the original game, and demonstrate superior empirical performance on standard benchmarks. The results offer a practical, tuning-free approach to fast NE computation in large sequential games with strong stability guarantees, supported by theoretical convergence and extensive experiments.

Abstract

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Consequently, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, the empirical convergence rates of the algorithms in these studies are suboptimal, since they do not utilize Regret Matching (RM)-based CFR algorithms to solve perturbed EFGs, which are known the exceptionally fast empirical convergence rates. Additionally, since solving multiple perturbed regularized EFGs is required, fine-tuning across all such games is infeasible, making parameter-free algorithms highly desirable. In this paper, we prove that CFR$^+$, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. Leveraging CFR$^+$ to solve perturbed regularized EFGs, we get Reward Transformation CFR$^+$ (RTCFR$^+$). Importantly, we extend prior work on the parameter-free property of CFR$^+$, enhancing its stability, which is crucial for the empirical convergence of RTCFR$^+$. Experiments show that RTCFR$^+$ significantly outperforms existing algorithms with theoretical last-iterate convergence guarantees.

Efficient Last-iterate Convergence Algorithms in Solving Games

TL;DR

This work addresses the challenge of achieving last-iterate convergence when learning Nash equilibria in extensive-form games. It leverages the Reward Transformation (RT) framework to recast NE learning into a sequence of perturbed regularized EFGs and introduces RTCFR^+, a parameter-free RM-based CFR algorithm that solves these perturbed problems using CFR^+. The authors prove both non-parameter-free and parameter-free last-iterate convergence for CFR^+ in solving perturbed regularized EFGs, enabling end-to-end last-iterate convergence for the original game, and demonstrate superior empirical performance on standard benchmarks. The results offer a practical, tuning-free approach to fast NE computation in large sequential games with strong stability guarantees, supported by theoretical convergence and extensive experiments.

Abstract

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Consequently, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, the empirical convergence rates of the algorithms in these studies are suboptimal, since they do not utilize Regret Matching (RM)-based CFR algorithms to solve perturbed EFGs, which are known the exceptionally fast empirical convergence rates. Additionally, since solving multiple perturbed regularized EFGs is required, fine-tuning across all such games is infeasible, making parameter-free algorithms highly desirable. In this paper, we prove that CFR, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. Leveraging CFR to solve perturbed regularized EFGs, we get Reward Transformation CFR (RTCFR). Importantly, we extend prior work on the parameter-free property of CFR, enhancing its stability, which is crucial for the empirical convergence of RTCFR. Experiments show that RTCFR significantly outperforms existing algorithms with theoretical last-iterate convergence guarantees.
Paper Structure (17 sections, 10 theorems, 85 equations, 12 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 10 theorems, 85 equations, 12 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Assuming all players follow the update rule of CFR$^+$ with any $\bm{\theta}^{1}_I \in \mathbb{R}^{|A(I)|}_{\geq 0}$ and $\eta > 0$, the strategy profile $\hat{\bm{x}}^{t}$ converges to the set of NEs of the perturbed regularized EFGs defined in (eq:BSPP-perturbed regularized) with any $\gamma > 0$

Figures (12)

  • Figure 1: Last-iterate convergence rates of different algorithms. Each algorithm runs for 20,000 iterations. In all plots,the x-axis is the number of iteration, and the y-axis represents exploitability,displayed on a logarithmic scale. Liar’s Dice ($x$) represents that every player is given a die with $x$ sides. Goofspiel ($x$) denotes that each player is dealt $x$ cards. Battleship ($x$) implies the size of grids is $x$. The number of infosets of Kuhn Poker, Leduc Poker, Battleship (3), Liar's Dice (3), Liar's Dice (4), Liar's Dice (5), Goofspiel (4), Goofspiel (5), Goofspiel (6) are 12, 936, 81027, 1024, 5120, 24576, 162, 2124, and 34482, respectively.
  • Figure 2: Last-iterate convergence rates of more RM-based algorithms.
  • Figure 3: Last-iterate convergence rates over the first 1000 iterations.
  • Figure 4: Last-iterate convergence rates of RTCFR$^+$ with $\mu = 0.0001$.
  • Figure 5: Last-iterate convergence rates of RTCFR$^+$ with $\mu = 0.0005$.
  • ...and 7 more figures

Theorems & Definitions (17)

  • Theorem 4.1: Proof is in \ref{['sec:prf:thm:convergence results of our algorithm']}
  • Lemma 4.2: Adapted from the proof of Lemma 4 in farina2021faster
  • Lemma 4.3: Proof is in \ref{['subsec:proof:lem:sum of counterfactual regret']}
  • Lemma 4.4: Proof is in \ref{['subsec:proof:lem:add term is positive']}
  • proof
  • Lemma B.1: Adapted from Lemma D.4 in sokota2022unified
  • Lemma B.2: Proof is in \ref{['subsec:proof:lem:maximum value of counterfactual value']}
  • Lemma B.3: Proof is in \ref{['subsec:proof:lem:smoothness of counterfactual value']}
  • Lemma B.4: Proof is in \ref{['subsec:proof:lem:relationship between behavioral strategy and sequence-form strategy']}
  • proof
  • ...and 7 more