Table of Contents
Fetching ...

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

TL;DR

This work addresses solving imperfect-information games by advancing regret-minimization methods. It introduces PDCFR+, an optimistic CFR variant that unifies ideas from PCFR+ and DCFR+ within a weighted regret framework built on Online Mirror Descent, enabling faster convergence and robustness to dominated actions. Theoretical results show convergence to Nash equilibrium under appropriate weighting schemes, and experiments across diverse IIGs demonstrate substantial speedups over prior CFR variants, especially in non-poker settings. The approach offers practical significance for scalable equilibrium computation in strategic settings and opens avenues for integrating weighting and prediction-based updates with function approximation and adaptive discounting.

Abstract

Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. It decomposes the total regret into counterfactual regrets, utilizing local regret minimization algorithms, such as Regret Matching (RM) or RM+, to minimize them. Recent research establishes a connection between Online Mirror Descent (OMD) and RM+, paving the way for an optimistic variant PRM+ and its extension PCFR+. However, PCFR+ assigns uniform weights for each iteration when determining regrets, leading to substantial regrets when facing dominated actions. This work explores minimizing weighted counterfactual regret with optimistic OMD, resulting in a novel CFR variant PDCFR+. It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions and consistently leveraging predictions to accelerate convergence. Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games. The code is available at https://github.com/rpSebastian/PDCFRPlus.

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

TL;DR

This work addresses solving imperfect-information games by advancing regret-minimization methods. It introduces PDCFR+, an optimistic CFR variant that unifies ideas from PCFR+ and DCFR+ within a weighted regret framework built on Online Mirror Descent, enabling faster convergence and robustness to dominated actions. Theoretical results show convergence to Nash equilibrium under appropriate weighting schemes, and experiments across diverse IIGs demonstrate substantial speedups over prior CFR variants, especially in non-poker settings. The approach offers practical significance for scalable equilibrium computation in strategic settings and opens avenues for integrating weighting and prediction-based updates with function approximation and adaptive discounting.

Abstract

Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. It decomposes the total regret into counterfactual regrets, utilizing local regret minimization algorithms, such as Regret Matching (RM) or RM+, to minimize them. Recent research establishes a connection between Online Mirror Descent (OMD) and RM+, paving the way for an optimistic variant PRM+ and its extension PCFR+. However, PCFR+ assigns uniform weights for each iteration when determining regrets, leading to substantial regrets when facing dominated actions. This work explores minimizing weighted counterfactual regret with optimistic OMD, resulting in a novel CFR variant PDCFR+. It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions and consistently leveraging predictions to accelerate convergence. Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games. The code is available at https://github.com/rpSebastian/PDCFRPlus.
Paper Structure (20 sections, 7 theorems, 47 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 7 theorems, 47 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

For all $\eta>0$, when employing OMD and optimistic OMD with $\psi=\frac{1}{2} \left\Vert \cdot \right\Vert_2^2$ as the algorithm $\mathbb{A}$, they reduce to WCFR+ and PWCFR+, respectively.

Figures (6)

  • Figure 1: Convergence results of four CFR variants on two games.
  • Figure 2: Convergence results of seven CFR variants on twelve testing games. Each algorithm runs for 20,000 iterations to display a long-time behavior. In all plots, the x-axis is the number of iteration, and the y-axis represents exploitability, displayed on a logarithmic scale.
  • Figure 3: The top plots illustrate the convergence results of four CFR variants on two games. The bottom plots show that PDCFR+ quickly learns stable cumulative regrets in NFG (3).
  • Figure 4: The complete game tree of Kuhn Poker.
  • Figure 5: The sequential decision process for player 1 in the game of Kuhn Poker.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • proof
  • proof
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 1 more