ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
Pedro Sequeira, Vidyasagar Sadhu, Melinda Gervasio
TL;DR
ToMCAT tackles cooperative multiagent planning under partial observability by integrating a meta-learned Theory-of-Mind (ToM) reasoning module with a multiagent diffusion policy (MADiff) and an online dynamic replanning loop. The ToMnet infers teammates' motivations and future behavior from limited observations, while MADiff generates ToM-conditioned joint trajectories; planning is continuously updated when a discrepancy with the world state is detected, using $k$ diffusion steps and a horizon $H$. In experiments in a two-agent Overcooked domain, ToMCAT demonstrates efficient online adaptation to both known and unknown teammates, achieving comparable task performance with substantially fewer planning steps than always-replan strategies, and showing significant gains from ToM conditioning over baselines. These results highlight the approach’s potential for flexible ad hoc teamwork and human-robot collaboration, where rapid, data-efficient adaptation to diverse teammates is essential. Future work aims to unify ToMnet and MADiff into a joint probabilistic model and to enhance robustness to unknown teammates and human data.
Abstract
In this paper we present ToMCAT (Theory-of-Mind for Cooperative Agents in Teams), a new framework for generating ToM-conditioned trajectories. It combines a meta-learning mechanism, that performs ToM reasoning over teammates' underlying goals and future behavior, with a multiagent denoising-diffusion model, that generates plans for an agent and its teammates conditioned on both the agent's goals and its teammates' characteristics, as computed via ToM. We implemented an online planning system that dynamically samples new trajectories (replans) from the diffusion model whenever it detects a divergence between a previously generated plan and the current state of the world. We conducted several experiments using ToMCAT in a simulated cooking domain. Our results highlight the importance of the dynamic replanning mechanism in reducing the usage of resources without sacrificing team performance. We also show that recent observations about the world and teammates' behavior collected by an agent over the course of an episode combined with ToM inferences are crucial to generate team-aware plans for dynamic adaptation to teammates, especially when no prior information is provided about them.
