Table of Contents
Fetching ...

ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies

Pedro Sequeira, Vidyasagar Sadhu, Melinda Gervasio

TL;DR

ToMCAT tackles cooperative multiagent planning under partial observability by integrating a meta-learned Theory-of-Mind (ToM) reasoning module with a multiagent diffusion policy (MADiff) and an online dynamic replanning loop. The ToMnet infers teammates' motivations and future behavior from limited observations, while MADiff generates ToM-conditioned joint trajectories; planning is continuously updated when a discrepancy with the world state is detected, using $k$ diffusion steps and a horizon $H$. In experiments in a two-agent Overcooked domain, ToMCAT demonstrates efficient online adaptation to both known and unknown teammates, achieving comparable task performance with substantially fewer planning steps than always-replan strategies, and showing significant gains from ToM conditioning over baselines. These results highlight the approach’s potential for flexible ad hoc teamwork and human-robot collaboration, where rapid, data-efficient adaptation to diverse teammates is essential. Future work aims to unify ToMnet and MADiff into a joint probabilistic model and to enhance robustness to unknown teammates and human data.

Abstract

In this paper we present ToMCAT (Theory-of-Mind for Cooperative Agents in Teams), a new framework for generating ToM-conditioned trajectories. It combines a meta-learning mechanism, that performs ToM reasoning over teammates' underlying goals and future behavior, with a multiagent denoising-diffusion model, that generates plans for an agent and its teammates conditioned on both the agent's goals and its teammates' characteristics, as computed via ToM. We implemented an online planning system that dynamically samples new trajectories (replans) from the diffusion model whenever it detects a divergence between a previously generated plan and the current state of the world. We conducted several experiments using ToMCAT in a simulated cooking domain. Our results highlight the importance of the dynamic replanning mechanism in reducing the usage of resources without sacrificing team performance. We also show that recent observations about the world and teammates' behavior collected by an agent over the course of an episode combined with ToM inferences are crucial to generate team-aware plans for dynamic adaptation to teammates, especially when no prior information is provided about them.

ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies

TL;DR

ToMCAT tackles cooperative multiagent planning under partial observability by integrating a meta-learned Theory-of-Mind (ToM) reasoning module with a multiagent diffusion policy (MADiff) and an online dynamic replanning loop. The ToMnet infers teammates' motivations and future behavior from limited observations, while MADiff generates ToM-conditioned joint trajectories; planning is continuously updated when a discrepancy with the world state is detected, using diffusion steps and a horizon . In experiments in a two-agent Overcooked domain, ToMCAT demonstrates efficient online adaptation to both known and unknown teammates, achieving comparable task performance with substantially fewer planning steps than always-replan strategies, and showing significant gains from ToM conditioning over baselines. These results highlight the approach’s potential for flexible ad hoc teamwork and human-robot collaboration, where rapid, data-efficient adaptation to diverse teammates is essential. Future work aims to unify ToMnet and MADiff into a joint probabilistic model and to enhance robustness to unknown teammates and human data.

Abstract

In this paper we present ToMCAT (Theory-of-Mind for Cooperative Agents in Teams), a new framework for generating ToM-conditioned trajectories. It combines a meta-learning mechanism, that performs ToM reasoning over teammates' underlying goals and future behavior, with a multiagent denoising-diffusion model, that generates plans for an agent and its teammates conditioned on both the agent's goals and its teammates' characteristics, as computed via ToM. We implemented an online planning system that dynamically samples new trajectories (replans) from the diffusion model whenever it detects a divergence between a previously generated plan and the current state of the world. We conducted several experiments using ToMCAT in a simulated cooking domain. Our results highlight the importance of the dynamic replanning mechanism in reducing the usage of resources without sacrificing team performance. We also show that recent observations about the world and teammates' behavior collected by an agent over the course of an episode combined with ToM inferences are crucial to generate team-aware plans for dynamic adaptation to teammates, especially when no prior information is provided about them.

Paper Structure

This paper contains 28 sections, 1 equation, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: The ToMCAT architecture. Left: a Theory-of-Mind network (ToMnet) makes predictions about the behavior and underlying motivations of the various teammates. Right: a Multiagent Diffusion model (MADiff) generates trajectories conditioned on the ToM reasoning. $\oplus$ indicates a concatenation operation.
  • Figure 2: The cooking domain used in the experiments.
  • Figure 3: MARL and ToMnet training results.
  • Figure 4: (a)--(b): impact of various replanning schemes on the agents' task and individual cumulative rewards. Covariance ellipses represent the $95\%$ CI of the mean. (c): mean probability over the course of trials of the ground-truth profile of the teammate under the ToMnet prediction in the presence (Prior) vs. absence (No Prior) of prior information about the teammate.
  • Figure : Online Dynamic Conditional Replanning