MADiff: Offline Multi-agent Learning with Diffusion Models

Zhengbang Zhu; Minghuan Liu; Liyuan Mao; Bingyi Kang; Minkai Xu; Yong Yu; Stefano Ermon; Weinan Zhang

MADiff: Offline Multi-agent Learning with Diffusion Models

Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, Weinan Zhang

TL;DR

MADiff introduces an attention-based diffusion model for offline multi-agent learning that supports both decentralized policies and a centralized controller within a CTDE framework. It models joint trajectories with an inter-agent attention mechanism and learns via a combination of diffusion and inverse dynamics losses, enabling action execution and teammate modeling through classifier-free guidance. Empirically, MADiff achieves strong performance on offline MARL benchmarks and demonstrates robust multi-agent trajectory prediction, outperforming several baselines and illustrating the value of explicit coordination modeling. The work highlights practical CTDE-enabled diffusion modeling for complex multi-agent interactions and identifies areas for scalability and stochasticity handling in future work.

Abstract

Offline reinforcement learning (RL) aims to learn policies from pre-existing datasets without further interactions, making it a challenging task. Q-learning algorithms struggle with extrapolation errors in offline settings, while supervised learning methods are constrained by model expressiveness. Recently, diffusion models (DMs) have shown promise in overcoming these limitations in single-agent learning, but their application in multi-agent scenarios remains unclear. Generating trajectories for each agent with independent DMs may impede coordination, while concatenating all agents' information can lead to low sample efficiency. Accordingly, we propose MADiff, which is realized with an attention-based diffusion model to model the complex coordination among behaviors of multiple agents. To our knowledge, MADiff is the first diffusion-based multi-agent learning framework, functioning as both a decentralized policy and a centralized controller. During decentralized executions, MADiff simultaneously performs teammate modeling, and the centralized controller can also be applied in multi-agent trajectory predictions. Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks, highlighting its effectiveness in modeling complex multi-agent interactions. Our code is available at https://github.com/zbzhu99/madiff.

MADiff: Offline Multi-agent Learning with Diffusion Models

TL;DR

Abstract

Paper Structure (39 sections, 11 equations, 12 figures, 9 tables, 2 algorithms)

This paper contains 39 sections, 11 equations, 12 figures, 9 tables, 2 algorithms.

Introduction
Preliminaries
Multi-agent Offline Reinforcement Learning
Diffusion Probabilistic Models
Diffusing Decision Making
Diffusing over state trajectories and acting with inverse dynamics model.
Classifier-free guided generation.
Methodology
Multi-Agent Diffusion with Attention
Centralized Training Objectives
Centralized Control or Decentralized Execution
Related Work
Experiments
Task Descriptions
Compared Baselines and Metrics
...and 24 more sections

Figures (12)

Figure 1: The architecture of MADiff, which is an attention-based diffusion network framework that performs attention across all agents at every decoder layer of each agent.
Figure 2: Visualization of an episode in the Spread task. Solid lines are real rollouts, and dashed lines are DM-planned trajectories.
Figure 3: The average normalized score of MADiff ablation variants in MPE tasks. The mean and standard error are computed over 5 different seeds.
Figure 4: Illustration of how agents' observations are modelled by MADiff in a three-agent environment. Note that figure (b) shows the situation when Agent 1 is taking action during decentralized execution.
Figure 5: Violin plots of returns in MA Mujoco datasets.
...and 7 more figures

MADiff: Offline Multi-agent Learning with Diffusion Models

TL;DR

Abstract

MADiff: Offline Multi-agent Learning with Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (12)