Preference-CFR$\:$ Beyond Nash Equilibrium for Better Game Strategies
Qi Ju, Thomas Tellier, Meng Sun, Zhemei Fang, Yunfeng Luo
TL;DR
This work addresses the limitation of Nash-equilibrium–centric CFR in incomplete-information games by introducing Preference CFR (Pref-CFR), which adds a per-information-set preference degree $\delta$ and a vulnerability degree $\beta$ to bias strategy learning toward preferred play styles within an $\epsilon$-NE bound. The approach links regret-based learning to Blackwell approachability, enabling convergence to diverse equilibria and stylized strategies without sacrificing competitive performance. Empirical results in Kuhn, Leduc, and Texas Hold\'em demonstrate that Pref-CFR can produce distinct, human-understandable styles such as Aggressive and Loose Passive, while in some cases revealing novel hand-strategy insights not captured by conventional CFR or expert heuristics. The findings suggest a practical path for tailoring AI behavior to user preferences and entertainment goals, with potential extensions to other domains and automated mapping from user metrics to information-set-specific preferences.
Abstract
Artificial intelligence (AI) has surpassed top human players in a variety of games. In imperfect information games, these achievements have primarily been driven by Counterfactual Regret Minimization (CFR) and its variants for computing Nash equilibrium. However, most existing research has focused on maximizing payoff, while largely neglecting the importance of strategic diversity and the need for varied play styles, thereby limiting AI's adaptability to different user preferences. To address this gap, we propose Preference-CFR (Pref-CFR), a novel method that incorporates two key parameters: preference degree and vulnerability degree. These parameters enable the AI to adjust its strategic distribution within an acceptable performance loss threshold, thereby enhancing its adaptability to a wider range of strategic demands. In our experiments with Texas Hold'em, Pref-CFR successfully trained Aggressive and Loose Passive styles that not only match original CFR-based strategies in performance but also display clearly distinct behavioral patterns. Notably, for certain hand scenarios, Pref-CFR produces strategies that diverge significantly from both conventional expert heuristics and original CFR outputs, potentially offering novel insights for professional players.
