Table of Contents
Fetching ...

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Ju Qi, Falin Hei, Ting Feng, Dengbing Yi, Zhemei Fang, Yunfeng Luo

TL;DR

A new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play), which combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies.

Abstract

Counterfactual Regret Minimization (CFR) and its variants are widely recognized as effective algorithms for solving extensive-form imperfect information games. Recently, many improvements have been focused on enhancing the convergence speed of the CFR algorithm. However, most of these variants are not applicable under Monte Carlo (MC) conditions, making them unsuitable for training in large-scale games. We introduce a new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play). MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies. Experimental results show that MCCFVFP achieved convergence speeds approximately 20\%$\sim$50\% faster than the most advanced MCCFR variants in games like poker and other test games.

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

TL;DR

A new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play), which combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies.

Abstract

Counterfactual Regret Minimization (CFR) and its variants are widely recognized as effective algorithms for solving extensive-form imperfect information games. Recently, many improvements have been focused on enhancing the convergence speed of the CFR algorithm. However, most of these variants are not applicable under Monte Carlo (MC) conditions, making them unsuitable for training in large-scale games. We introduce a new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play). MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies. Experimental results show that MCCFVFP achieved convergence speeds approximately 20\%50\% faster than the most advanced MCCFR variants in games like poker and other test games.
Paper Structure (37 sections, 2 theorems, 30 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 37 sections, 2 theorems, 30 equations, 8 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Goal eq:A2 can be attained if and only if every halfspace $\mathcal{H}_t\supseteq S$ is forceable.

Figures (8)

  • Figure 1: The figure compares the convergence rates of the RM and FP algorithms in a $100\times 100$ random payoff matrix game generated from a $N(0,1)$ distribution. In right figure, the convergence for a standard random payoff matrix is shown, while left figure illustrates the convergence in $100 \times 100$ random payoff matrix where the payoffs for actions 1 to 10 are uniformly increased by 5 (causing actions 11 to 100 to have a high probability of being dominated strategies). It can be observed that in this setting, the convergence rate of the FP is very close to that of RM. Considering that the complexity of one FP iteration is only $\mathscr{O}(|\mathcal{A}|)$ compared to the complexity of RM, which is $\mathscr{O}(|\mathcal{A}|^2)$, in a clear game, the overall convergence rate of FP can actually surpass that of RM. Each scenario tested an average of 30 rounds. The shaded areas represent the 90% confidence intervals for these trials. The experiments in Appendix \ref{['sec:a2']} can also confirm our view from another perspective.
  • Figure 2: Convergence rates in Kuhn-extension, Leduc-extension, and princess-and-monster games are shown. In the first two rows, time is measured in milliseconds (ms). The last two rows reflect the same running time but with the horizontal axis representing the number of nodes touched during iteration. All experiments tested over an average of 30 rounds. The shaded areas indicate 90% confidence intervals for the trials.
  • Figure 3: The difference between RM and FP (CFVFP in normal-form game) in a two-dimensional plane
  • Figure 4: Convergence rate of MCCFVFP variants in different games
  • Figure 5: Convergence rate of different weighted average schemes for CFVFP
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Theorem 2