Table of Contents
Fetching ...

Approximating Human Strategic Reasoning with LLM-Enhanced Recursive Reasoners Leveraging Multi-agent Hypergames

Vince Trencsenyi, Agnieszka Mensfelt, Kostas Stathis

TL;DR

This work tackles the evaluation of recursive strategic reasoning in LLM-driven multi-agent systems by introducing a centralized umpire framework and hypergame-based reasoning to enable deep, multi-level belief modeling in two-player beauty contest tasks. The authors propose a κ-based measure to complement traditional k-level reasoning and demonstrate how LLM-enhanced agents can outperform a cognitive hierarchy baseline and approximate human data in structured game settings. Through two experiments with profiling, they show that professional-domain prompts and agent profiles modulate reasoning depth, with some models approaching human performance yet not consistently achieving the optimal solution. The key contributions are a flexible, agency-rich MAS platform, the κ metric for reasoning depth, and empirical evidence that artificial reasoners can closely track human behavior and sometimes exceed baseline performance, informing the design of future LLM-based strategic agents. This work advances systematic evaluation of LLM reasoning in strategic contexts and motivates semantic methods to assess the quality of inferred reasoning processes.

Abstract

LLM-driven multi-agent-based simulations have been gaining traction with applications in game-theoretic and social simulations. While most implementations seek to exploit or evaluate LLM-agentic reasoning, they often do so with a weak notion of agency and simplified architectures. We implement a role-based multi-agent strategic interaction framework tailored to sophisticated recursive reasoners, providing the means for systematic in-depth development and evaluation of strategic reasoning. Our game environment is governed by the umpire responsible for facilitating games, from matchmaking through move validation to environment management. Players incorporate state-of-the-art LLMs in their decision mechanism, relying on a formal hypergame-based model of hierarchical beliefs. We use one-shot, 2-player beauty contests to evaluate the recursive reasoning capabilities of the latest LLMs, providing a comparison to an established baseline model from economics and data from human experiments. Furthermore, we introduce the foundations of an alternative semantic measure of reasoning to the k-level theory. Our experiments show that artificial reasoners can outperform the baseline model in terms of both approximating human behaviour and reaching the optimal solution.

Approximating Human Strategic Reasoning with LLM-Enhanced Recursive Reasoners Leveraging Multi-agent Hypergames

TL;DR

This work tackles the evaluation of recursive strategic reasoning in LLM-driven multi-agent systems by introducing a centralized umpire framework and hypergame-based reasoning to enable deep, multi-level belief modeling in two-player beauty contest tasks. The authors propose a κ-based measure to complement traditional k-level reasoning and demonstrate how LLM-enhanced agents can outperform a cognitive hierarchy baseline and approximate human data in structured game settings. Through two experiments with profiling, they show that professional-domain prompts and agent profiles modulate reasoning depth, with some models approaching human performance yet not consistently achieving the optimal solution. The key contributions are a flexible, agency-rich MAS platform, the κ metric for reasoning depth, and empirical evidence that artificial reasoners can closely track human behavior and sometimes exceed baseline performance, informing the design of future LLM-based strategic agents. This work advances systematic evaluation of LLM reasoning in strategic contexts and motivates semantic methods to assess the quality of inferred reasoning processes.

Abstract

LLM-driven multi-agent-based simulations have been gaining traction with applications in game-theoretic and social simulations. While most implementations seek to exploit or evaluate LLM-agentic reasoning, they often do so with a weak notion of agency and simplified architectures. We implement a role-based multi-agent strategic interaction framework tailored to sophisticated recursive reasoners, providing the means for systematic in-depth development and evaluation of strategic reasoning. Our game environment is governed by the umpire responsible for facilitating games, from matchmaking through move validation to environment management. Players incorporate state-of-the-art LLMs in their decision mechanism, relying on a formal hypergame-based model of hierarchical beliefs. We use one-shot, 2-player beauty contests to evaluate the recursive reasoning capabilities of the latest LLMs, providing a comparison to an established baseline model from economics and data from human experiments. Furthermore, we introduce the foundations of an alternative semantic measure of reasoning to the k-level theory. Our experiments show that artificial reasoners can outperform the baseline model in terms of both approximating human behaviour and reaching the optimal solution.

Paper Structure

This paper contains 15 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of centralised MAS framework.
  • Figure 2: Per model means, standard deviations and estimated $k$-levels. Human data for standard deviation was unavailable.
  • Figure 3: Per model means (LLMs with profiles), standard deviations and estimated $k$-levels. Human data for standard deviation was unavailable.