Table of Contents
Fetching ...

Quantal Response Equilibrium as a Measure of Strategic Sophistication: Theory and Validation for LLM Evaluation

Mateo Pechon-Elkins, Jon Chun

TL;DR

A game-theoretic evaluation framework grounded in quantal response equilibrium (QRE) is introduced, and closed-form equilibria for four strategic games, each targeting a distinct cognitive capability are derived.

Abstract

Theory of Mind benchmarks for large language models typically produce aggregate scores without theoretical grounding, making it unclear whether high performance reflects strategic reasoning or surface-level heuristics. We introduce a game-theoretic evaluation framework grounded in quantal response equilibrium (QRE). We derive closed-form equilibria for four strategic games, each targeting a distinct cognitive capability. We estimate QRE rationality parameters lambda that place model behavior on a continuous scale calibrated against human data (lambda_human in [1.0, 2.5]), and establish finite-sample convergence bounds via martingale concentration. Validation across 1,855 games with seven frontier models (plus four expansion models) confirms predictions: bluff rates converge to within 4% of equilibrium, lambda estimates range from 0.05 to 1.10 across games and models with substantial cross-model variation, and capability profiles differ across cognitive axes. Robustness analyses reveal high sensitivity to prompt framing and version instability in QRE rankings, highlighting the need for standardized protocols.

Quantal Response Equilibrium as a Measure of Strategic Sophistication: Theory and Validation for LLM Evaluation

TL;DR

A game-theoretic evaluation framework grounded in quantal response equilibrium (QRE) is introduced, and closed-form equilibria for four strategic games, each targeting a distinct cognitive capability are derived.

Abstract

Theory of Mind benchmarks for large language models typically produce aggregate scores without theoretical grounding, making it unclear whether high performance reflects strategic reasoning or surface-level heuristics. We introduce a game-theoretic evaluation framework grounded in quantal response equilibrium (QRE). We derive closed-form equilibria for four strategic games, each targeting a distinct cognitive capability. We estimate QRE rationality parameters lambda that place model behavior on a continuous scale calibrated against human data (lambda_human in [1.0, 2.5]), and establish finite-sample convergence bounds via martingale concentration. Validation across 1,855 games with seven frontier models (plus four expansion models) confirms predictions: bluff rates converge to within 4% of equilibrium, lambda estimates range from 0.05 to 1.10 across games and models with substantial cross-model variation, and capability profiles differ across cognitive axes. Robustness analyses reveal high sensitivity to prompt framing and version instability in QRE rankings, highlighting the need for standardized protocols.
Paper Structure (35 sections, 8 theorems, 12 equations, 9 figures, 13 tables)

This paper contains 35 sections, 8 theorems, 12 equations, 9 figures, 13 tables.

Key Result

Theorem 1

The Strategic Claim stage game admits an approximate symmetric equilibrium profile $(\sigma^*, \sigma^*)$ with the following structure: This is a pedagogical approximation, not an exact MSNE: the deterministic receiver threshold cannot sustain exact sender indifference (which requires mixed challenge probabilities $q(c)$ per claim level). The conditional bluff rate $\beta^* = 0.340$ nonetheless p

Figures (9)

  • Figure 1: Schematic overview of the four games. Each targets a distinct ToM capability: Strategic Claim measures recursive reasoning through bluffing; STST measures conceptual grounding through word convergence; Repeated PD measures relational modeling through trust dynamics; Text-Dixit measures epistemic modeling through confidence calibration.
  • Figure 2: Round-by-round equilibrium convergence with 95% CI bands computed from 270 SC and 270 RPD games. (a) Strategic Claim: conditional bluff rate (given $v \leq 3$) converges toward $\beta^* = 0.340$ (exponential fit $\rho = 0.81$, $R^2 = 0.87$). (b) Repeated PD: cooperation rate starts high and stabilizes near 70%, a behavioral departure from the SPE prediction of mutual defection. Both panels show monotonic convergence toward equilibrium predictions.
  • Figure 3: Per-model convergence trajectories in Strategic Claim showing heterogeneous learning rates. Contraction factor $\rho$ estimated via exponential fit on conditional bluff rate (given $v \leq 3$). Dashed lines show exponential fit overlays.
  • Figure 4: Bayesian QRE rationality parameter ($\lambda$) posterior means with 95% HDI. Shaded region shows human baseline range $\lambda \in [1.5, 2.5]$ from experimental literature goeree2016. Most models fall below human baselines, likely reflecting QRE identifiability limitations when agents play near equilibrium. Cross-model variation remains diagnostically informative.
  • Figure 5: Bayesian posterior densities for QRE $\lambda$ (Strategic Claim) under Gamma(2,1) prior. Models separate into two clusters: GPT-4o-mini, Gemini 2.0, and Gemini 2.5 show moderate rationality ($\lambda \in [0.3, 0.6]$), while GPT-5-mini, Claude Haiku, and Kimi K2 concentrate near zero, reflecting near-equilibrium play where $\lambda$ is poorly identified.
  • ...and 4 more figures

Theorems & Definitions (16)

  • Definition 1: Functional ToM
  • Theorem 1: Approximate Equilibrium Profile
  • Proposition 2: Cooperation as Behavioral Finding
  • Definition 2: Logit QRE
  • Theorem 3: ELO Convergence
  • Theorem 4: Finite-Sample Bound
  • Theorem 5: Within-Game Convergence in Expectation
  • Definition 3: Strategic Claim Stage Game
  • Definition 4: Repeated PD with Cheap Talk
  • Definition 5: STST Coordination Game
  • ...and 6 more