Table of Contents
Fetching ...

Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents

Hsieh-Ting Lin, Tsung-Yu Hou

Abstract

Theory of Mind (ToM) -- the ability to model others' mental states -- is fundamental to human social cognition. Whether large language models (LLMs) can develop ToM has been tested exclusively through static vignettes, leaving open whether ToM-like reasoning can emerge through dynamic interaction. Here we report that autonomous LLM agents playing extended sessions of Texas Hold'em poker progressively develop sophisticated opponent models, but only when equipped with persistent memory. In a 2x2 factorial design crossing memory (present/absent) with domain knowledge (present/absent), each with five replications (N = 20 experiments, ~6,000 agent-hand observations), we find that memory is both necessary and sufficient for ToM-like behavior emergence (Cliff's delta = 1.0, p = 0.008). Agents with memory reach ToM Level 3-5 (predictive to recursive modeling), while agents without memory remain at Level 0 across all replications. Strategic deception grounded in opponent models occurs exclusively in memory-equipped conditions (Fisher's exact p < 0.001). Domain expertise does not gate ToM-like behavior emergence but enhances its application: agents without poker knowledge develop equivalent ToM levels but less precise deception (p = 0.004). Agents with ToM deviate from game-theoretically optimal play (67% vs. 79% TAG adherence, delta = -1.0, p = 0.008) to exploit specific opponents, mirroring expert human play. All mental models are expressed in natural language and directly readable, providing a transparent window into AI social cognition. Cross-model validation with GPT-4o yields weighted Cohen's kappa = 0.81 (almost perfect agreement). These findings demonstrate that functional ToM-like behavior can emerge from interaction dynamics alone, without explicit training or prompting, with implications for understanding artificial social intelligence and biological social cognition.

Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents

Abstract

Theory of Mind (ToM) -- the ability to model others' mental states -- is fundamental to human social cognition. Whether large language models (LLMs) can develop ToM has been tested exclusively through static vignettes, leaving open whether ToM-like reasoning can emerge through dynamic interaction. Here we report that autonomous LLM agents playing extended sessions of Texas Hold'em poker progressively develop sophisticated opponent models, but only when equipped with persistent memory. In a 2x2 factorial design crossing memory (present/absent) with domain knowledge (present/absent), each with five replications (N = 20 experiments, ~6,000 agent-hand observations), we find that memory is both necessary and sufficient for ToM-like behavior emergence (Cliff's delta = 1.0, p = 0.008). Agents with memory reach ToM Level 3-5 (predictive to recursive modeling), while agents without memory remain at Level 0 across all replications. Strategic deception grounded in opponent models occurs exclusively in memory-equipped conditions (Fisher's exact p < 0.001). Domain expertise does not gate ToM-like behavior emergence but enhances its application: agents without poker knowledge develop equivalent ToM levels but less precise deception (p = 0.004). Agents with ToM deviate from game-theoretically optimal play (67% vs. 79% TAG adherence, delta = -1.0, p = 0.008) to exploit specific opponents, mirroring expert human play. All mental models are expressed in natural language and directly readable, providing a transparent window into AI social cognition. Cross-model validation with GPT-4o yields weighted Cohen's kappa = 0.81 (almost perfect agreement). These findings demonstrate that functional ToM-like behavior can emerge from interaction dynamics alone, without explicit training or prompting, with implications for understanding artificial social intelligence and biological social cognition.

Paper Structure

This paper contains 16 sections, 4 figures, 2 tables.

Table of Contents

  1. Results
  2. Discussion

Figures (4)

  • Figure 1: Real-time spectator interface during a Full condition session (Hand 3, Turn). The interface displays each agent's cards, chips, and position around the poker table. The Theory of Mind panel (top left) shows real-time ToM level assessments for each agent. Agent memory notes appear as speech bubbles, illustrating the natural-language opponent models that constitute "readable minds." Statistics panel (left) tracks VPIP, PFR, aggression factor, and win rate across the session.
  • Figure 2: ToM level trajectories across hands. Mean maximum ToM level per hand bin, averaged across five replications. Memory-equipped conditions (Full, No-Skill) show progressive development from Level 0 to Level 3--5, while memory-absent conditions (No-Memory, Baseline) remain at Level 0 throughout. Shaded regions indicate $\pm$1 SD across replications.
  • Figure 3: TAG adherence and chip spread by condition. (A) Mean TAG adherence (preflop action alignment with tight-aggressive baseline strategy) across the four conditions. Memory-equipped conditions show significantly lower TAG adherence, reflecting deliberate exploitation of opponent tendencies. (B) Final chip spread (richest minus poorest player) at session end. Error bars indicate $\pm$1 SD across five replications.
  • Figure 4: Cross-model inter-rater reliability confusion matrix. Comparison of ToM level codes assigned by Claude Sonnet (rows) and GPT-4o (columns) on a stratified random sample of memory snapshots. The concentration of values on the main diagonal reflects near-perfect agreement ($\kappa = 0.81$). All disagreements are between adjacent levels, confirming systematic consistency in the coding rubric.