Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Logan Cross; Violet Xiang; Agam Bhatia; Daniel LK Yamins; Nick Haber

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Logan Cross, Violet Xiang, Agam Bhatia, Daniel LK Yamins, Nick Haber

TL;DR

Hypothetical Minds introduces an embodied large language model agent with a novel Theory of Mind (ToM) module that generates, evaluates, and refines language-based hypotheses about other agents’ strategies to scaffold high-level planning in diverse, partially observable multi-agent environments. The approach combines perception, memory, hierarchical planning, and a Rescorla-Wagner-based intrinsic reward mechanism to adapt online to competitive, collaborative, and mixed-motive tasks in the Melting Pot benchmark, outperforming prior LLM-based agents and several RL baselines. Key contributions include the explicit ToM machinery, MAP-based hypothesis selection, online refinement, and systematic ablations that demonstrate the value of hypothesis evaluation for complex social dynamics. The work suggests a scalable path toward general-purpose autonomous agents that can infer and adapt to hidden agent intentions, with implications for real-world multi-agent coordination and social interaction.

Abstract

Multi-agent reinforcement learning (MARL) methods struggle with the non-stationarity of multi-agent systems and fail to adaptively learn online when tested with novel agents. Here, we leverage large language models (LLMs) to create an autonomous agent that can handle these challenges. Our agent, Hypothetical Minds, consists of a cognitively-inspired architecture, featuring modular components for perception, memory, and hierarchical planning over two levels of abstraction. We introduce the Theory of Mind module that scaffolds the high-level planning process by generating hypotheses about other agents' strategies in natural language. It then evaluates and iteratively refines these hypotheses by reinforcing hypotheses that make correct predictions about the other agents' behavior. Hypothetical Minds significantly improves performance over previous LLM-agent and RL baselines on a range of competitive, mixed motive, and collaborative domains in the Melting Pot benchmark, including both dyadic and population-based environments. Additionally, comparisons against LLM-agent baselines and ablations reveal the importance of hypothesis evaluation and refinement for succeeding on complex scenarios.

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

TL;DR

Abstract

Paper Structure (51 sections, 11 equations, 16 figures, 5 tables)

This paper contains 51 sections, 11 equations, 16 figures, 5 tables.

Introduction
Related Work
LLM-based Agents
Reasoning and Hypothesis Search with LLMs
Multi-agent Decision-Making and Theory of Mind
Method
Partially-observable Markov games
Hypothetical Minds Model
Theory of Mind Module
Hypothesis Evaluation
Conditioning High-Level Plans
Experiments
How does Hypothetical Minds perform in competitive environments?
How does Hypothetical Minds perform in collaborative environments?
How does Hypothetical Minds perform in mixed-motive environments?
...and 36 more sections

Figures (16)

Figure 1: A. Hypothetical Minds architecture and model workflow. B. ToM module generates hypotheses about agent strategies. Previously generated hypotheses and values are shown for refinement. Top k hypotheses predict agent's next behavior $\hat{\phi(\tau)}$, considering counterfactual scenarios. Highest-valued hypothesis informs high-level planning. Later, hypotheses are evaluated against observed behavior $\phi(\tau)$, updating values with intrinsic reward. Hypotheses are validated at a threshold.
Figure 2: Results for all models. Average reward per episode (with normalized steps for variable length episodes) for each environment and scenario. 5 seeds are generating for each model, with errorbars reflecting the SEM across those 5 episodes.
Figure 3: HM's reward per number of interactions before or after a hypothesis meets the validation threshold and is used for high-level strategy selection in RWS. Vertical green line indicates the average reward at the point where a hypothesis is validated, and positive and negative numbers on the x-axis indicate how many interactions before or after this point. Shaded region represents the range where the good hypothesis is typically first generated with a 95% confidence interval.
Figure 4: Comparing different versions of HM. Errorbars reflect SEM across 5 episodes.
Figure 5: Comparing base LLM models. 3 seeds are generated per LLM.
...and 11 more figures

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

TL;DR

Abstract

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (16)