GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systems

Yiqin Yang; Xu Yang; Yuhua Jiang; Ni Mu; Hao Hu; Runpeng Xie; Ziyou Zhang; Siyuan Li; Yuan-Hua Ni; Qianchuan Zhao; Bo Xu

GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systems

Yiqin Yang, Xu Yang, Yuhua Jiang, Ni Mu, Hao Hu, Runpeng Xie, Ziyou Zhang, Siyuan Li, Yuan-Hua Ni, Qianchuan Zhao, Bo Xu

TL;DR

GlobeDiff overcomes ambiguities in state estimation while simultaneously inferring the global state with high fidelity and it is proved that the estimation error of GlobeDiff under both unimodal and multi-modal distributions can be bounded.

Abstract

In the realm of multi-agent systems, the challenge of \emph{partial observability} is a critical barrier to effective coordination and decision-making. Existing approaches, such as belief state estimation and inter-agent communication, often fall short. Belief-based methods are limited by their focus on past experiences without fully leveraging global information, while communication methods often lack a robust model to effectively utilize the auxiliary information they provide. To solve this issue, we propose Global State Diffusion Algorithm~(GlobeDiff) to infer the global state based on the local observations. By formulating the state inference process as a multi-modal diffusion process, GlobeDiff overcomes ambiguities in state estimation while simultaneously inferring the global state with high fidelity. We prove that the estimation error of GlobeDiff under both unimodal and multi-modal distributions can be bounded. Extensive experimental results demonstrate that GlobeDiff achieves superior performance and is capable of accurately inferring the global state.

GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systems

TL;DR

Abstract

Paper Structure (35 sections, 5 theorems, 45 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 35 sections, 5 theorems, 45 equations, 12 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Partial Observability
Diffusion Model for RL
Preliminaries
Dec-POMDPs
Generative Model for Global State Inference
Method
Global State Diffusion Process
Training
Inference
Theoretical Analysis
Practical Implementation
Architecture
Training Mechanism
...and 20 more sections

Key Result

Theorem 1

Assume the trained model satisfies the following two assumptions. (1) Diffusion noise prediction MSE: $\mathbb{E}_{s^k,x,z,k}[\| \epsilon_\theta(s^k,x,z,k) - \epsilon \|^2] \leq \delta^2$, (2) Prior alignment: $D_{\text{KL}}(p_\phi(z\mid x) \| p(z\mid x)) \leq \varepsilon_{\text{KL}}$. Then, for any where $W_2$ is the 2-Wasserstein distance between $p_{\theta,\phi}(s\mid x)$ and $p(s\mid x)$, $\te

Figures (12)

Figure 1: The overall framework of GlobeDiff. During the execution phase, we first construct auxiliary local observations $x$ and then infer the global state $\hat{s}$ using GlobeDiff. Agents make decisions based on the inferred global state $\hat{s}$.
Figure 2: The training process of Globediff is divided into two parts: minimizing the difference between the prior network $p_{\phi}$ and the posterior network $q_{\psi}$, and then training the diffusion model based on the forward and backward process.
Figure 3: Comparison results with global state inference baselines in SMAC-v1 (PO) tasks with win rate over three random seeds.
Figure 4: Comparison results with global state inference baselines in SMAC-v2 (PO) tasks with win rate over three random seeds.
Figure 5: Visualization of global states generated by GlobeDiff, VAE and MLP. The first plot displays true states and subsequent plots show inferred states per agent. White points denote individual states with polygons highlighting local neighborhoods. Gradient shading (light green to purple) indicates training progression. The similarity between the polygon structures of the inferred and true states reflects the predicted quality.
...and 7 more figures

Theorems & Definitions (10)

Theorem 1: Single-Sample Expectation Error Bound with Latent Variable
proof
Theorem 2: Multi-Modal Error Bound with Latent Variable
proof
Theorem
proof
Lemma 1
proof
Theorem
proof

GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systems

TL;DR

Abstract

GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (10)