Table of Contents
Fetching ...

On Diffusion Models for Multi-Agent Partial Observability: Shared Attractors, Error Bounds, and Composite Flow

Tonghan Wang, Heng Dong, Yanchen Jiang, David C. Parkes, Milind Tambe

TL;DR

This work addresses how diffusion models can reconstruct global states from local histories in decentralized partially observable multi-agent systems. It shows that diffusion dynamics produce stable fixed points that, under CO, coincide with the true state, while in non-CO settings they yield the full posterior over states consistent with joint histories. The authors reveal that DL approximation errors shift fixed points away from true states with a deviation that is inversely related to the Jacobian rank, and they bound these deviations using a surrogate linear regression model. To overcome fixed-point deviations, they introduce composite diffusion, which iteratively denoises across agents and provably converges to the true state (or its convex hull in CO) with a quantifiable error bound, validated on SMACv2 and sensor-network benchmarks. These results provide a principled path toward integrating diffusion-based state reconstruction into centralized training and decentralized execution for improved multi-agent coordination and policy learning.

Abstract

Multiagent systems grapple with partial observability (PO), and the decentralized POMDP (Dec-POMDP) model highlights the fundamental nature of this challenge. Whereas recent approaches to addressing PO have appealed to deep learning models, providing a rigorous understanding of how these models and their approximation errors affect agents' handling of PO and their interactions remain a challenge. In addressing this challenge, we investigate reconstructing global states from local action-observation histories in Dec-POMDPs using diffusion models. We first find that diffusion models conditioned on local history represent possible states as stable fixed points. In collectively observable (CO) Dec-POMDPs, individual diffusion models conditioned on agents' local histories share a unique fixed point corresponding to the global state, while in non-CO settings, shared fixed points yield a distribution of possible states given joint history. We further find that, with deep learning approximation errors, fixed points can deviate from true states and the deviation is negatively correlated to the Jacobian rank. Inspired by this low-rank property, we bound a deviation by constructing a surrogate linear regression model that approximates the local behavior of a diffusion model. With this bound, we propose a \emph{composite diffusion process} iterating over agents with theoretical convergence guarantees to the true state.

On Diffusion Models for Multi-Agent Partial Observability: Shared Attractors, Error Bounds, and Composite Flow

TL;DR

This work addresses how diffusion models can reconstruct global states from local histories in decentralized partially observable multi-agent systems. It shows that diffusion dynamics produce stable fixed points that, under CO, coincide with the true state, while in non-CO settings they yield the full posterior over states consistent with joint histories. The authors reveal that DL approximation errors shift fixed points away from true states with a deviation that is inversely related to the Jacobian rank, and they bound these deviations using a surrogate linear regression model. To overcome fixed-point deviations, they introduce composite diffusion, which iteratively denoises across agents and provably converges to the true state (or its convex hull in CO) with a quantifiable error bound, validated on SMACv2 and sensor-network benchmarks. These results provide a principled path toward integrating diffusion-based state reconstruction into centralized training and decentralized execution for improved multi-agent coordination and policy learning.

Abstract

Multiagent systems grapple with partial observability (PO), and the decentralized POMDP (Dec-POMDP) model highlights the fundamental nature of this challenge. Whereas recent approaches to addressing PO have appealed to deep learning models, providing a rigorous understanding of how these models and their approximation errors affect agents' handling of PO and their interactions remain a challenge. In addressing this challenge, we investigate reconstructing global states from local action-observation histories in Dec-POMDPs using diffusion models. We first find that diffusion models conditioned on local history represent possible states as stable fixed points. In collectively observable (CO) Dec-POMDPs, individual diffusion models conditioned on agents' local histories share a unique fixed point corresponding to the global state, while in non-CO settings, shared fixed points yield a distribution of possible states given joint history. We further find that, with deep learning approximation errors, fixed points can deviate from true states and the deviation is negatively correlated to the Jacobian rank. Inspired by this low-rank property, we bound a deviation by constructing a surrogate linear regression model that approximates the local behavior of a diffusion model. With this bound, we propose a \emph{composite diffusion process} iterating over agents with theoretical convergence guarantees to the true state.

Paper Structure

This paper contains 25 sections, 22 theorems, 28 equations, 5 figures, 1 table.

Key Result

Theorem 1

[Converged Diffusion] In the absence of approximation errors, repeatedly applying the denoiser network$f_\theta \IfBlankTF{\tau_i}{}{({\tau_i}}, \IfBlankTF{y}{}{{y}})$ converges to a state $s$ that is consistent with $\tau_i$ and has a dominate posterior probability given $y$: $\phi \IfBlankTF{\inft

Figures (5)

  • Figure 1: With minimal deep learning approximation errors, a diffusion process represents states consistent with local history $\tau_i$ (length=1 in this figure) as their attractors, provably equivalent to stable fixed points of the denoiser network$f_\theta \IfBlankTF{\tau_i}{}{({\tau_i}}, \IfBlankTF{\cdot}{}{{\cdot}})$. Arrows point to denoiser network outputs $y'= f_\theta \IfBlankTF{\tau_i}{}{({\tau_i}}, \IfBlankTF{y}{}{{y}})$ from input noisy states $y$. The first two dimensions of $y'$ and $y$ are shown. Top row: In collectively observable (CO) Dec-POMDPs, a unique fixed point is shared by all agents, which is also the true state. Bottom row: In non-CO Dec-POMDPs, shared fixed points are all states consistent with joint history $\bm\tau$, and diffusion models reproduce the posterior state distribution $p(s|\tau_i)$ under appropriate distributions of input noisy states.
  • Figure 2: Deep learning approximation errors cause fixed points to deviate from true states. Deviation norms are related to the Jacobian rank and can be upper bounded by a surrogate linear model. (a,b,d) In the 5$\times$5 sensor network (with a zoomed-in view in (c)), we show changes in state dimensions corresponding to $\mathtt{Area 2}$ and $\mathtt{4}$ during diffusion. (e,f) The impact of these deviations becomes evident when fixed points of all agents are displayed together in a single panel: the true state can no longer be determined by intersecting fixed point sets of all agents, as it is possible that $\cap_i \mathcal{F} \IfBlankTF{\phi}{}{_{\phi}} {\IfBlankTF{}{}{^{}}} \IfBlankTF{\tau_i}{}{(\tau_i} \IfBlankTF{\tau_i}{}{)} =\varnothing$. (g) Empirical evidence from SMACv2 and the 5$\times$5 sensor network shows that deviation norms negatively correlate to Jacobian ranks and are tightly upper bounded by optimal residual errors of the surrogate linear model.
  • Figure 3: Practical factors contributing to low Jacobian ranks (which correlate negatively with deviations of fixed points from true states) include narrow network architectures, large state space sizes, and small numbers of training samples. In each panel, a blue point represents the deviation of a fixed point, and distributions of these deviations are displayed on the left.
  • Figure 4: Evolution of denoised state distributions (first two dimensions) during composite diffusion processes, initialized with various noisy states, in the 5$\times$5 sensor network. Each panel shows the changes (from open circles to closed circles) over 6 denoising iterations, with each iteration conditioned on the history of a single agent, e.g., in iteration 0-5, six agents in the corresponding order are involved. (a) Composite diffusion converges to the true state regardless of the agent ordering. (b) Partial composite diffusion may converge to incorrect states depending on the participating agents.
  • Figure 5: Policy learning performance of MAPPO with agent-specific states (as in MAPPO's vanilla implementation), joint histories, true states, and states predicted by diffusion models.

Theorems & Definitions (28)

  • Definition 1: History-State Mapping
  • Definition 2: Discrete-time flow
  • Definition 3: Push-forward equation
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Definition 4: Deviation of fixed points from states
  • Theorem 5
  • Corollary 5.1
  • ...and 18 more