On Diffusion Models for Multi-Agent Partial Observability: Shared Attractors, Error Bounds, and Composite Flow
Tonghan Wang, Heng Dong, Yanchen Jiang, David C. Parkes, Milind Tambe
TL;DR
This work addresses how diffusion models can reconstruct global states from local histories in decentralized partially observable multi-agent systems. It shows that diffusion dynamics produce stable fixed points that, under CO, coincide with the true state, while in non-CO settings they yield the full posterior over states consistent with joint histories. The authors reveal that DL approximation errors shift fixed points away from true states with a deviation that is inversely related to the Jacobian rank, and they bound these deviations using a surrogate linear regression model. To overcome fixed-point deviations, they introduce composite diffusion, which iteratively denoises across agents and provably converges to the true state (or its convex hull in CO) with a quantifiable error bound, validated on SMACv2 and sensor-network benchmarks. These results provide a principled path toward integrating diffusion-based state reconstruction into centralized training and decentralized execution for improved multi-agent coordination and policy learning.
Abstract
Multiagent systems grapple with partial observability (PO), and the decentralized POMDP (Dec-POMDP) model highlights the fundamental nature of this challenge. Whereas recent approaches to addressing PO have appealed to deep learning models, providing a rigorous understanding of how these models and their approximation errors affect agents' handling of PO and their interactions remain a challenge. In addressing this challenge, we investigate reconstructing global states from local action-observation histories in Dec-POMDPs using diffusion models. We first find that diffusion models conditioned on local history represent possible states as stable fixed points. In collectively observable (CO) Dec-POMDPs, individual diffusion models conditioned on agents' local histories share a unique fixed point corresponding to the global state, while in non-CO settings, shared fixed points yield a distribution of possible states given joint history. We further find that, with deep learning approximation errors, fixed points can deviate from true states and the deviation is negatively correlated to the Jacobian rank. Inspired by this low-rank property, we bound a deviation by constructing a surrogate linear regression model that approximates the local behavior of a diffusion model. With this bound, we propose a \emph{composite diffusion process} iterating over agents with theoretical convergence guarantees to the true state.
