Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Ju Qi; Falin Hei; Ting Feng; Dengbing Yi; Zhemei Fang; Yunfeng Luo

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Ju Qi, Falin Hei, Ting Feng, Dengbing Yi, Zhemei Fang, Yunfeng Luo

TL;DR

A new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play), which combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies.

Abstract

Counterfactual Regret Minimization (CFR) and its variants are widely recognized as effective algorithms for solving extensive-form imperfect information games. Recently, many improvements have been focused on enhancing the convergence speed of the CFR algorithm. However, most of these variants are not applicable under Monte Carlo (MC) conditions, making them unsuitable for training in large-scale games. We introduce a new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play). MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies. Experimental results show that MCCFVFP achieved convergence speeds approximately 20\%$\sim$50\% faster than the most advanced MCCFR variants in games like poker and other test games.

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

TL;DR

Abstract

50\% faster than the most advanced MCCFR variants in games like poker and other test games.

Paper Structure (37 sections, 2 theorems, 30 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 37 sections, 2 theorems, 30 equations, 8 figures, 3 tables, 2 algorithms.

Introduction
Notation and Preliminaries
Game Theory
Normal-Form Game
Extensive-Form Games
Nash Equilibrium
Dominated Strategy and Clear Games
Regret Matching and Counterfactual Regret Minimization
Fictitious Play
Motivation of CFVFP
CFVFP Method
Counterfactual Value Fictitious Play Implementation
Theoretical Analysis of MCCFVFP Algorithm
Experiments
Description of the Game and Experimental Settings
...and 22 more sections

Key Result

Theorem 1

Goal eq:A2 can be attained if and only if every halfspace $\mathcal{H}_t\supseteq S$ is forceable.

Figures (8)

Figure 1: The figure compares the convergence rates of the RM and FP algorithms in a $100\times 100$ random payoff matrix game generated from a $N(0,1)$ distribution. In right figure, the convergence for a standard random payoff matrix is shown, while left figure illustrates the convergence in $100 \times 100$ random payoff matrix where the payoffs for actions 1 to 10 are uniformly increased by 5 (causing actions 11 to 100 to have a high probability of being dominated strategies). It can be observed that in this setting, the convergence rate of the FP is very close to that of RM. Considering that the complexity of one FP iteration is only $\mathscr{O}(|\mathcal{A}|)$ compared to the complexity of RM, which is $\mathscr{O}(|\mathcal{A}|^2)$, in a clear game, the overall convergence rate of FP can actually surpass that of RM. Each scenario tested an average of 30 rounds. The shaded areas represent the 90% confidence intervals for these trials. The experiments in Appendix \ref{['sec:a2']} can also confirm our view from another perspective.
Figure 2: Convergence rates in Kuhn-extension, Leduc-extension, and princess-and-monster games are shown. In the first two rows, time is measured in milliseconds (ms). The last two rows reflect the same running time but with the horizontal axis representing the number of nodes touched during iteration. All experiments tested over an average of 30 rounds. The shaded areas indicate 90% confidence intervals for the trials.
Figure 3: The difference between RM and FP (CFVFP in normal-form game) in a two-dimensional plane
Figure 4: Convergence rate of MCCFVFP variants in different games
Figure 5: Convergence rate of different weighted average schemes for CFVFP
...and 3 more figures

Theorems & Definitions (4)

Definition 1
Definition 2
Theorem 1
Theorem 2

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

TL;DR

Abstract

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)