Table of Contents
Fetching ...

Preference-CFR$\:$ Beyond Nash Equilibrium for Better Game Strategies

Qi Ju, Thomas Tellier, Meng Sun, Zhemei Fang, Yunfeng Luo

TL;DR

This work addresses the limitation of Nash-equilibrium–centric CFR in incomplete-information games by introducing Preference CFR (Pref-CFR), which adds a per-information-set preference degree $\delta$ and a vulnerability degree $\beta$ to bias strategy learning toward preferred play styles within an $\epsilon$-NE bound. The approach links regret-based learning to Blackwell approachability, enabling convergence to diverse equilibria and stylized strategies without sacrificing competitive performance. Empirical results in Kuhn, Leduc, and Texas Hold\'em demonstrate that Pref-CFR can produce distinct, human-understandable styles such as Aggressive and Loose Passive, while in some cases revealing novel hand-strategy insights not captured by conventional CFR or expert heuristics. The findings suggest a practical path for tailoring AI behavior to user preferences and entertainment goals, with potential extensions to other domains and automated mapping from user metrics to information-set-specific preferences.

Abstract

Artificial intelligence (AI) has surpassed top human players in a variety of games. In imperfect information games, these achievements have primarily been driven by Counterfactual Regret Minimization (CFR) and its variants for computing Nash equilibrium. However, most existing research has focused on maximizing payoff, while largely neglecting the importance of strategic diversity and the need for varied play styles, thereby limiting AI's adaptability to different user preferences. To address this gap, we propose Preference-CFR (Pref-CFR), a novel method that incorporates two key parameters: preference degree and vulnerability degree. These parameters enable the AI to adjust its strategic distribution within an acceptable performance loss threshold, thereby enhancing its adaptability to a wider range of strategic demands. In our experiments with Texas Hold'em, Pref-CFR successfully trained Aggressive and Loose Passive styles that not only match original CFR-based strategies in performance but also display clearly distinct behavioral patterns. Notably, for certain hand scenarios, Pref-CFR produces strategies that diverge significantly from both conventional expert heuristics and original CFR outputs, potentially offering novel insights for professional players.

Preference-CFR$\:$ Beyond Nash Equilibrium for Better Game Strategies

TL;DR

This work addresses the limitation of Nash-equilibrium–centric CFR in incomplete-information games by introducing Preference CFR (Pref-CFR), which adds a per-information-set preference degree and a vulnerability degree to bias strategy learning toward preferred play styles within an -NE bound. The approach links regret-based learning to Blackwell approachability, enabling convergence to diverse equilibria and stylized strategies without sacrificing competitive performance. Empirical results in Kuhn, Leduc, and Texas Hold\'em demonstrate that Pref-CFR can produce distinct, human-understandable styles such as Aggressive and Loose Passive, while in some cases revealing novel hand-strategy insights not captured by conventional CFR or expert heuristics. The findings suggest a practical path for tailoring AI behavior to user preferences and entertainment goals, with potential extensions to other domains and automated mapping from user metrics to information-set-specific preferences.

Abstract

Artificial intelligence (AI) has surpassed top human players in a variety of games. In imperfect information games, these achievements have primarily been driven by Counterfactual Regret Minimization (CFR) and its variants for computing Nash equilibrium. However, most existing research has focused on maximizing payoff, while largely neglecting the importance of strategic diversity and the need for varied play styles, thereby limiting AI's adaptability to different user preferences. To address this gap, we propose Preference-CFR (Pref-CFR), a novel method that incorporates two key parameters: preference degree and vulnerability degree. These parameters enable the AI to adjust its strategic distribution within an acceptable performance loss threshold, thereby enhancing its adaptability to a wider range of strategic demands. In our experiments with Texas Hold'em, Pref-CFR successfully trained Aggressive and Loose Passive styles that not only match original CFR-based strategies in performance but also display clearly distinct behavioral patterns. Notably, for certain hand scenarios, Pref-CFR produces strategies that diverge significantly from both conventional expert heuristics and original CFR outputs, potentially offering novel insights for professional players.

Paper Structure

This paper contains 27 sections, 2 theorems, 43 equations, 7 figures, 3 tables.

Key Result

Theorem 2.3

Goal eq:A2 can be attained if and only if every halfspace $\mathcal{H}_t\supseteq S$ is forceable.

Figures (7)

  • Figure 1: Convergence rate of CFR in Kuhn poker (left) and fluctuation of $\alpha$ in CFR algorithm iterations (right). Thirty experiments were performed for each setting, and the shaded area indicates the 90% confidence interval of these trials (the settings remain unchanged in subsequent experiments). It can be seen that regardless of the initial strategy, all CFR iterations converge to $\alpha = 0.2$.
  • Figure 2: Convergence rate of CFR/Pref-CFR in Kuhn poker (left) and the fluctuation of the $\alpha$ value during CFR/Pref-CFR iterations (right). It is evident that the Pref-CFR algorithm can still converge to equilibrium, with a convergence speed comparable to that of the original CFR. Additionally, the right figure clearly demonstrates that with the parameter design of Pref-CFR, the final strategy successfully converges to different NEs.
  • Figure 3: Convergence rates of ES-MCCFR/Pref-ES-MCCFR in Leduc poker (left) and the fluctuations in the probability of choosing Call during ES-MCCFR/ES-MCPref-CFR iterations (right). This figure shows that in Leduc poker, strategies will converge to different equilibria only when $\beta>0$ is set.
  • Figure 4: Convergence rates of ES-MCCFR/Pref-ES-MCCFR in Leduc poker (left) and the fluctuations in the probability of choosing Call during ES-MCCFR/ES-MCPref-CFR iterations (right). Obviously, the larger $\beta$ is, the higher the probability of choosing to Call and the more obvious the strategy style is.
  • Figure 5: Strategy display for Texas Hold'em. In the top left corner of each image, the current player's information and available actions are displayed. The central area showcases the strategies for different hand combinations at this stage. In Texas Hold'em poker, there are 13 ranks across 4 suits, with no distinction in value between suits, resulting in 169 unique hand combinations. These are represented in a 13$\times$13 matrix, where the lower left displays offsuit hands and the upper right shows suited hands. Each matrix element’s color indicates the strategic choice for the corresponding hand: blue for folding, green for calling, red shades for raising (with deeper red shades indicating higher raises), and black-red for going all-in. The bottom row provides an overview of the average strategies across all hands, allowing for a visual understanding of the overall strategy distribution.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Conjecture 3.1
  • Definition 2.1
  • Definition 2.2
  • Theorem 2.3
  • Theorem 2.4