Modeling Others' Minds as Code

Kunal Jha; Aydan Yuenan Huang; Eric Ye; Natasha Jaques; Max Kleiman-Weiner

Modeling Others' Minds as Code

Kunal Jha, Aydan Yuenan Huang, Eric Ye, Natasha Jaques, Max Kleiman-Weiner

TL;DR

This work reframes modeling others' minds as a program synthesis problem, introducing ROTE, which uses LLMs to generate executable Python representations of observed behaviors and Bayesian inference via Sequential Monte Carlo to infer the most plausible scripts. By treating action understanding as a code-generation and inference task, ROTE achieves superior generalization and efficiency, outperforming behavior cloning and LLM-based baselines by up to 50% in gridworld and embodied household scenarios, and achieving human-level accuracy on human data. The approach offers a scalable, interpretable pathway for predicting human and AI behavior in real-world settings, with implications for safer and more adaptable human-AI collaboration.

Abstract

Accurate prediction of human behavior is essential for robust and safe human-AI collaboration. However, existing approaches for modeling people are often data-hungry and brittle because they either make unrealistic assumptions about rationality or are too computationally demanding to adapt rapidly. Our key insight is that many everyday social interactions may follow predictable patterns; efficient "scripts" that minimize cognitive load for actors and observers, e.g., "wait for the green light, then go." We propose modeling these routines as behavioral programs instantiated in computer code rather than policies conditioned on beliefs and desires. We introduce ROTE, a novel algorithm that leverages both large language models (LLMs) for synthesizing a hypothesis space of behavioral programs, and probabilistic inference for reasoning about uncertainty over that space. We test ROTE in a suite of gridworld tasks and a large-scale embodied household simulator. ROTE predicts human and AI behaviors from sparse observations, outperforming competitive baselines -- including behavior cloning and LLM-based methods -- by as much as 50% in terms of in-sample accuracy and out-of-sample generalization. By treating action understanding as a program synthesis problem, ROTE opens a path for AI systems to efficiently and effectively predict human behavior in the real-world.

Modeling Others' Minds as Code

TL;DR

Abstract

Modeling Others' Minds as Code

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)