Table of Contents
Fetching ...

Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning

Richard Dewey, Janos Botyanszki, Ciamac C. Moallemi, Andrew T. Zheng

TL;DR

This work presents Solly, the first AI agent to reach elite human performance in reduced-format multi-player Liar's Poker using self-play and a model-free actor-critic reinforcement learning framework based on regularized Nash dynamics. By training in a multi-agent, shared-policy setting and leveraging a simple MLP architecture within OpenSpiel, Solly achieves robust performance against elite humans and outperforms large language models on key bidding and equity metrics. The study characterizes the game's probabilistic structure via conditional probability reasoning, analyzes state-space growth with more players and larger hands, and demonstrates Solly's relative exploitability as training progresses. The findings highlight Solly's potential to scale to full game sizes, reveal LLMs' limitations in bluffing-based multi-agent settings, and suggest practical directions for scalable, data-efficient learning in imperfect-information environments with rich strategic dynamics.

Abstract

AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold'em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.

Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning

TL;DR

This work presents Solly, the first AI agent to reach elite human performance in reduced-format multi-player Liar's Poker using self-play and a model-free actor-critic reinforcement learning framework based on regularized Nash dynamics. By training in a multi-agent, shared-policy setting and leveraging a simple MLP architecture within OpenSpiel, Solly achieves robust performance against elite humans and outperforms large language models on key bidding and equity metrics. The study characterizes the game's probabilistic structure via conditional probability reasoning, analyzes state-space growth with more players and larger hands, and demonstrates Solly's relative exploitability as training progresses. The findings highlight Solly's potential to scale to full game sizes, reveal LLMs' limitations in bluffing-based multi-agent settings, and suggest practical directions for scalable, data-efficient learning in imperfect-information environments with rich strategic dynamics.

Abstract

AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold'em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.

Paper Structure

This paper contains 19 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Reduced-format 3x3 Liar's Poker is played by bidding on the cumulative digits across all players. Solly calculates the bidding policy using a neural network and selects a move from the distribution output by it. Solly was trained via self-play.
  • Figure 2: Best response scores for the 3x3 3-player configuration. A lower score means a better quality Solly agent. The first panel shows the average best response score for agents trained to play against various Solly training checkpoints across all player positions. The Solly agents improve (become less exploitable) as training progresses. The second panel shows the scores of the exploiting agents playing in each of the three player positions, zoomed in on checkpoints 5M and above.
  • Figure 3: Best response scores for 3x3 3-player demonstrating the scaling techniques introduced in Section \ref{['sec:Scaling']}. In the first panel, we rewrite the Liar's Poker environment to encode hands as digit counts, training on abstract ("canonical") hands rather than explicit digits. We compare this agent to the original 3x3 3-player agent used for play against elite humans. In the second panel, we compare against an agent trained with a deeper MLP (7 layers of 512 neurons each) and rewards scaled by a factor of 10.