Table of Contents
Fetching ...

Faster Game Solving via Asymmetry of Step Sizes

Linjian Meng, Tianpei Yang, Youzhi Zhang, Zhenxing Ge, Yang Gao

TL;DR

The paper tackles robustness challenges in Predictive CFR$^+$ (PCFR$^+$) when its predictive signals misalign with observed counterfactual regrets in two-player zero-sum imperfect-information extensive-form games. It introduce Asymmetric PCFR$^+$ (APCFR$^+$) with adaptive asymmetry between implicit and explicit regret updates to dampen the influence of prediction errors, plus Simple APCFR$^+$ (SAPCFR$^+$) for easy implementation. A theoretical regret bound shows the prediction inaccuracy impact is reduced by a factor of $1+\alpha^t$ under APCFR$^+$, and an automatic $\alpha^t$-learning mechanism is proposed; SAPCFR$^+$ fixes $\alpha^t$ to 2 for a single-line modification. Empirical evaluation across standard IIG benchmarks and HUNL subgames demonstrates superior or competitive convergence compared to PCFR$^+$ and related CFR variants, with APCFR$^+$ and SAPCFR$^+$ generally outperforming in most games and APDCFR$^+$ offering substantial gains when combined with discounting. The work suggests that asymmetric step-size updates are broadly applicable to CFR families, offering a practical route to more robust fast-converging solvers in complex strategic games.

Abstract

Counterfactual Regret Minimization (CFR) algorithms are widely used to compute a Nash equilibrium (NE) in two-player zero-sum imperfect-information extensive-form games (IIGs). Among them, Predictive CFR$^+$ (PCFR$^+$) is particularly powerful, achieving an exceptionally fast empirical convergence rate via the prediction in many games.However, the empirical convergence rate of PCFR$^+$ would significantly degrade if the prediction is inaccurate, leading to unstable performance on certain IIGs. To enhance the robustness of PCFR$^+$, we propose Asymmetric PCFR$^+$ (APCFR$^+$), which employs an adaptive asymmetry of step sizes between the updates of implicit and explicit accumulated counterfactual regrets to mitigate the impact of the prediction inaccuracy on convergence. We present a theoretical analysis demonstrating why APCFR$^+$ can enhance the robustness. To the best of our knowledge, we are the first to propose the asymmetry of step sizes, a simple yet novel technique that effectively improves the robustness of PCFR$^+$. Then, to reduce the difficulty of implementing APCFR$^+$ caused by the adaptive asymmetry, we propose a simplified version of APCFR$^+$ called Simple APCFR$^+$ (SAPCFR$^+$), which uses a fixed asymmetry of step sizes to enable only a single-line modification compared to original PCFR$^+$.Experimental results on five standard IIG benchmarks and two heads-up no-limit Texas Hold' em (HUNL) Subagems show that (i) both APCFR$^+$ and SAPCFR$^+$ outperform PCFR$^+$ in most of the tested games, (ii) SAPCFR$^+$ achieves a comparable empirical convergence rate with APCFR$^+$,and (iii) our approach can be generalized to improve other CFR algorithms, e.g., Discount CFR (DCFR).

Faster Game Solving via Asymmetry of Step Sizes

TL;DR

The paper tackles robustness challenges in Predictive CFR (PCFR) when its predictive signals misalign with observed counterfactual regrets in two-player zero-sum imperfect-information extensive-form games. It introduce Asymmetric PCFR (APCFR) with adaptive asymmetry between implicit and explicit regret updates to dampen the influence of prediction errors, plus Simple APCFR (SAPCFR) for easy implementation. A theoretical regret bound shows the prediction inaccuracy impact is reduced by a factor of under APCFR, and an automatic -learning mechanism is proposed; SAPCFR fixes to 2 for a single-line modification. Empirical evaluation across standard IIG benchmarks and HUNL subgames demonstrates superior or competitive convergence compared to PCFR and related CFR variants, with APCFR and SAPCFR generally outperforming in most games and APDCFR offering substantial gains when combined with discounting. The work suggests that asymmetric step-size updates are broadly applicable to CFR families, offering a practical route to more robust fast-converging solvers in complex strategic games.

Abstract

Counterfactual Regret Minimization (CFR) algorithms are widely used to compute a Nash equilibrium (NE) in two-player zero-sum imperfect-information extensive-form games (IIGs). Among them, Predictive CFR (PCFR) is particularly powerful, achieving an exceptionally fast empirical convergence rate via the prediction in many games.However, the empirical convergence rate of PCFR would significantly degrade if the prediction is inaccurate, leading to unstable performance on certain IIGs. To enhance the robustness of PCFR, we propose Asymmetric PCFR (APCFR), which employs an adaptive asymmetry of step sizes between the updates of implicit and explicit accumulated counterfactual regrets to mitigate the impact of the prediction inaccuracy on convergence. We present a theoretical analysis demonstrating why APCFR can enhance the robustness. To the best of our knowledge, we are the first to propose the asymmetry of step sizes, a simple yet novel technique that effectively improves the robustness of PCFR. Then, to reduce the difficulty of implementing APCFR caused by the adaptive asymmetry, we propose a simplified version of APCFR called Simple APCFR (SAPCFR), which uses a fixed asymmetry of step sizes to enable only a single-line modification compared to original PCFR.Experimental results on five standard IIG benchmarks and two heads-up no-limit Texas Hold' em (HUNL) Subagems show that (i) both APCFR and SAPCFR outperform PCFR in most of the tested games, (ii) SAPCFR achieves a comparable empirical convergence rate with APCFR,and (iii) our approach can be generalized to improve other CFR algorithms, e.g., Discount CFR (DCFR).

Paper Structure

This paper contains 13 sections, 4 theorems, 40 equations, 13 figures, 6 tables.

Key Result

Theorem 4.1

[Proof is in Appendix sec:proof thm:regret bound of P2PCFR]. Assuming that $T$ iterations of APCFR$^+$ with any $\alpha^t_I \geq 0$ are conducted, the counterfactual regret at any infoset $I \in \mathcal{I}$ is bound by

Figures (13)

  • Figure 1: Comparison between PCFR$^+$ and APCFR$^+$, with differences highlighted in red. Note that the notation $t$ in $\alpha^t_I$ denotes iteration $t$, rather than an exponent.
  • Figure 2: Dynamics of inaccuracy in PCFR$^+$ between predicted and observed instantaneous counterfactual regrets in Leduc Poker and Battleship (3,2,3). This inaccuracy is related to the theoretical convergence rate of PCFR$^+$. The values on the y-axis are normalized to the range [0, 1], which is displayed on a logarithmic scale.
  • Figure 3: Empirical convergence rates of the tested algorithms in standard commonly used IIG benchmarks. In all plots, the x-axis is the number of iterations, and the y-axis is exploitability, displayed on a logarithmic scale. Liar’s Dice ($x$) represents that every player is given a die with $x$ sides. Goofspiel ($x$) denotes that each player is dealt $x$ cards. Battleship ($x,y,z$) implies the size of the grid is $x\times y$, and the number of shots is $z$.
  • Figure 4: Dynamics of $\alpha^t_I$ in standard commonly used IIG benchmarks. Note that, contrary to figures in the main text, the x-axis in this figure is on a logarithmic scale, while the y-axis is not.
  • Figure 5: Dynamics of $\alpha^t_I$ in HUNL Subgames.
  • ...and 8 more figures

Theorems & Definitions (6)

  • Theorem 4.1
  • Lemma 4.2
  • proof
  • Lemma A.1
  • Theorem B.1
  • proof