Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent
Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
TL;DR
This work addresses solving imperfect-information games by advancing regret-minimization methods. It introduces PDCFR+, an optimistic CFR variant that unifies ideas from PCFR+ and DCFR+ within a weighted regret framework built on Online Mirror Descent, enabling faster convergence and robustness to dominated actions. Theoretical results show convergence to Nash equilibrium under appropriate weighting schemes, and experiments across diverse IIGs demonstrate substantial speedups over prior CFR variants, especially in non-poker settings. The approach offers practical significance for scalable equilibrium computation in strategic settings and opens avenues for integrating weighting and prediction-based updates with function approximation and adaptive discounting.
Abstract
Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. It decomposes the total regret into counterfactual regrets, utilizing local regret minimization algorithms, such as Regret Matching (RM) or RM+, to minimize them. Recent research establishes a connection between Online Mirror Descent (OMD) and RM+, paving the way for an optimistic variant PRM+ and its extension PCFR+. However, PCFR+ assigns uniform weights for each iteration when determining regrets, leading to substantial regrets when facing dominated actions. This work explores minimizing weighted counterfactual regret with optimistic OMD, resulting in a novel CFR variant PDCFR+. It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions and consistently leveraging predictions to accelerate convergence. Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games. The code is available at https://github.com/rpSebastian/PDCFRPlus.
