Table of Contents
Fetching ...

Deep (Predictive) Discounted Counterfactual Regret Minimization

Hang Xu, Kai Li, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

TL;DR

This work tackles solving two-player zero-sum imperfect-information games with model-free neural CFR. It introduces VR-DeepDCFR+ and VR-DeepPDCFR+ which bootstrap cumulative advantages, apply discounting and clipping, and integrate variance-reduced baselines to emulate advanced tabular CFR variants within neural CFR. Empirical results show faster convergence and stronger performance in eight IIGs and a large poker game compared with baselines like OS-DeepCFR and DREAM, with ablations confirming the value of bootstrapping, advanced CFR approximation, and variance reduction. The approach advances scalable equilibrium computation in large IIGs and offers a practical path to leveraging sophisticated CFR dynamics in neural models for complex strategic domains.

Abstract

Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. To enhance CFR's applicability in large games, researchers use neural networks to approximate its behavior. However, existing methods are mainly based on vanilla CFR and struggle to effectively integrate more advanced CFR variants. In this work, we propose an efficient model-free neural CFR algorithm, overcoming the limitations of existing methods in approximating advanced CFR variants. At each iteration, it collects variance-reduced sampled advantages based on a value network, fits cumulative advantages by bootstrapping, and applies discounting and clipping operations to simulate the update mechanisms of advanced CFR variants. Experimental results show that, compared with model-free neural algorithms, it exhibits faster convergence in typical imperfect-information games and demonstrates stronger adversarial performance in a large poker game.

Deep (Predictive) Discounted Counterfactual Regret Minimization

TL;DR

This work tackles solving two-player zero-sum imperfect-information games with model-free neural CFR. It introduces VR-DeepDCFR+ and VR-DeepPDCFR+ which bootstrap cumulative advantages, apply discounting and clipping, and integrate variance-reduced baselines to emulate advanced tabular CFR variants within neural CFR. Empirical results show faster convergence and stronger performance in eight IIGs and a large poker game compared with baselines like OS-DeepCFR and DREAM, with ablations confirming the value of bootstrapping, advanced CFR approximation, and variance reduction. The approach advances scalable equilibrium computation in large IIGs and offers a practical path to leveraging sophisticated CFR dynamics in neural models for complex strategic domains.

Abstract

Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. To enhance CFR's applicability in large games, researchers use neural networks to approximate its behavior. However, existing methods are mainly based on vanilla CFR and struggle to effectively integrate more advanced CFR variants. In this work, we propose an efficient model-free neural CFR algorithm, overcoming the limitations of existing methods in approximating advanced CFR variants. At each iteration, it collects variance-reduced sampled advantages based on a value network, fits cumulative advantages by bootstrapping, and applies discounting and clipping operations to simulate the update mechanisms of advanced CFR variants. Experimental results show that, compared with model-free neural algorithms, it exhibits faster convergence in typical imperfect-information games and demonstrates stronger adversarial performance in a large poker game.

Paper Structure

This paper contains 31 sections, 4 theorems, 23 equations, 5 figures, 4 tables.

Key Result

Theorem 1

By using outcome sampling to collect data $(I, \hat{r}_i^t(I))$ into a buffer $\mathcal{B}_i$ for player $i$ in iteration $t$, and training a neural network $r(I, a \mid \phi^t_i)$ on loss $\mathcal{L}(\phi_i^t)=\mathbb{E}_{(I, \hat{r}_i^t(I))\sim \mathcal{B}_i}\left[\sum_{a\in \mathcal{A}(I)}^{}\le

Figures (5)

  • Figure 1: Convergence results of seven model-free neural algorithms on eight testing games.
  • Figure 2: Head-to-head evaluation results of four neural CFR variants on FHP.
  • Figure 3: Ablation study of fitting cumulative advantages.
  • Figure 4: Ablation study of approximating advanced CFR variants.
  • Figure 5: Ablation study of variance reduction.

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • Theorem 4
  • proof