Table of Contents
Fetching ...

Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

Yang Zhang, Xinran Li, Jianing Ye, Shuang Qiu, Delin Qu, Xiu Li, Chongjie Zhang, Chenjia Bai

TL;DR

This work introduces DIMA, a diffusion-inspired, sequentially conditioned multi-agent world model that centralizes dynamics modeling while achieving linear complexity in the number of agents. By structuring prediction as a reverse-diffusion-like process conditioned on one agent at a time and enforcing permutation invariance, DIMA delivers accurate long-horizon dynamics and robust imagined rollouts for policy learning under CTDE. Empirical results on MAMuJoCo and Bi-DexHands show state-of-the-art sample efficiency and final returns, with ablations confirming the value of sequential conditioning in low-data regimes. The approach advances MARL by unifying a diffusion-based generative paradigm with centralized modeling to better capture global state transitions and inter-agent dependencies.

Abstract

World models have recently attracted growing interest in Multi-Agent Reinforcement Learning (MARL) due to their ability to improve sample efficiency for policy learning. However, accurately modeling environments in MARL is challenging due to the exponentially large joint action space and highly uncertain dynamics inherent in multi-agent systems. To address this, we reduce modeling complexity by shifting from jointly modeling the entire state-action transition dynamics to focusing on the state space alone at each timestep through sequential agent modeling. Specifically, our approach enables the model to progressively resolve uncertainty while capturing the structured dependencies among agents, providing a more accurate representation of how agents influence the state. Interestingly, this sequential revelation of agents' actions in a multi-agent system aligns with the reverse process in diffusion models--a class of powerful generative models known for their expressiveness and training stability compared to autoregressive or latent variable models. Leveraging this insight, we develop a flexible and robust world model for MARL using diffusion models. Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks, significantly outperforming prior world models in terms of final return and sample efficiency, including MAMuJoCo and Bi-DexHands. DIMA establishes a new paradigm for constructing multi-agent world models, advancing the frontier of MARL research. Codes are open-sourced at https://github.com/breez3young/DIMA.

Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

TL;DR

This work introduces DIMA, a diffusion-inspired, sequentially conditioned multi-agent world model that centralizes dynamics modeling while achieving linear complexity in the number of agents. By structuring prediction as a reverse-diffusion-like process conditioned on one agent at a time and enforcing permutation invariance, DIMA delivers accurate long-horizon dynamics and robust imagined rollouts for policy learning under CTDE. Empirical results on MAMuJoCo and Bi-DexHands show state-of-the-art sample efficiency and final returns, with ablations confirming the value of sequential conditioning in low-data regimes. The approach advances MARL by unifying a diffusion-based generative paradigm with centralized modeling to better capture global state transitions and inter-agent dependencies.

Abstract

World models have recently attracted growing interest in Multi-Agent Reinforcement Learning (MARL) due to their ability to improve sample efficiency for policy learning. However, accurately modeling environments in MARL is challenging due to the exponentially large joint action space and highly uncertain dynamics inherent in multi-agent systems. To address this, we reduce modeling complexity by shifting from jointly modeling the entire state-action transition dynamics to focusing on the state space alone at each timestep through sequential agent modeling. Specifically, our approach enables the model to progressively resolve uncertainty while capturing the structured dependencies among agents, providing a more accurate representation of how agents influence the state. Interestingly, this sequential revelation of agents' actions in a multi-agent system aligns with the reverse process in diffusion models--a class of powerful generative models known for their expressiveness and training stability compared to autoregressive or latent variable models. Leveraging this insight, we develop a flexible and robust world model for MARL using diffusion models. Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks, significantly outperforming prior world models in terms of final return and sample efficiency, including MAMuJoCo and Bi-DexHands. DIMA establishes a new paradigm for constructing multi-agent world models, advancing the frontier of MARL research. Codes are open-sourced at https://github.com/breez3young/DIMA.

Paper Structure

This paper contains 36 sections, 1 theorem, 20 equations, 8 figures, 10 tables, 1 algorithm.

Key Result

Theorem 2

Under Assumption assume:diffusion_inspired, the log-likelihood of the multi-agent global state transition (i.e., the evidence of the transition) is lower bounded as follows,

Figures (8)

  • Figure 1: Illustration of the DIMA world model. From the temporal perspective, each environmental timestep is modeled as a complete denoising process, analogous to diffusion models. Within each timestep, we further consider an agent-wise perspective, where the introduction of each individual agent's action information represents a single denoising step, progressively reducing uncertainty about the next state.
  • Figure 2: Comparison between conventional flattened multi-agent modeling and DIMA's sequential agent modeling. Light gray indicates clean states; dark gray indicates noisy states.
  • Figure 3: Overview of the reward and termination model. DIMA addresses reward and termination prediction from a global perspective using a transformer architecture to capture temporal correlations. Both functions share the same backbone with separate prediction heads.
  • Figure 4: Curves of averaged episode returns for all methods in MAMuJoCo.
  • Figure 5: Curves of averaged episode returns for all methods in Bi-DexHands.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 2: ELBO under the Diffusion-Inspired Formulation
  • proof