CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

Riccardo Barbano; Alexander Denker; Zeljko Kereta; Runchang Li; Francisco Vargas

CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

Riccardo Barbano, Alexander Denker, Zeljko Kereta, Runchang Li, Francisco Vargas

TL;DR

The paper addresses compositional generation with multiple pretrained diffusion models when the target distribution is not explicitly known. It reframes composition as a cooperative stochastic optimal control problem where each diffusion model is an agent with state $X_t^{u,i}$ and the joint objective depends on the aggregated output $Y_t = \varphi(\{X_t^{u,i}\}_i,t)$. An Iterative Diffusion Optimisation (IDO) procedure performs coordinate-descent updates of the agent controls to minimize the SOC objective $\mathcal{J}(u)$, using Monte Carlo estimates and backpropagation through time; a Tweedie look-ahead provides a stable running cost. Experimental results on MNIST show CMAD achieving lower terminal loss and higher realism than a per-step gradient baseline (CDPS), across configurations with 2–3 agents. These results indicate that SOC-based cooperative control enables flexible, task-driven composition of diffusion models without requiring explicit algebraic density combinations, with potential for scaling to higher-dimensional problems.

Abstract

Continuous-time generative models have achieved remarkable success in image restoration and synthesis. However, controlling the composition of multiple pre-trained models remains an open challenge. Current approaches largely treat composition as an algebraic composition of probability densities, such as via products or mixtures of experts. This perspective assumes the target distribution is known explicitly, which is almost never the case. In this work, we propose a different paradigm that formulates compositional generation as a cooperative Stochastic Optimal Control problem. Rather than combining probability densities, we treat pre-trained diffusion models as interacting agents whose diffusion trajectories are jointly steered, via optimal control, toward a shared objective defined on their aggregated output. We validate our framework on conditional MNIST generation and compare it against a naive inference-time DPS-style baseline replacing learned cooperative control with per-step gradient guidance.

CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

TL;DR

and the joint objective depends on the aggregated output

. An Iterative Diffusion Optimisation (IDO) procedure performs coordinate-descent updates of the agent controls to minimize the SOC objective

, using Monte Carlo estimates and backpropagation through time; a Tweedie look-ahead provides a stable running cost. Experimental results on MNIST show CMAD achieving lower terminal loss and higher realism than a per-step gradient baseline (CDPS), across configurations with 2–3 agents. These results indicate that SOC-based cooperative control enables flexible, task-driven composition of diffusion models without requiring explicit algebraic density combinations, with potential for scaling to higher-dimensional problems.

Abstract

Paper Structure (24 sections, 50 equations, 9 figures, 1 table, 3 algorithms)

This paper contains 24 sections, 50 equations, 9 figures, 1 table, 3 algorithms.

Introduction
Cooperative Multi-Agent Diffusion
Connection to classical SOC.
Control-wise Descent via Iterative Diffusion Optimisation
Experimental Evaluation and Discussion
Conclusion and Future Work
Related Work
Background
Generative Models as Continuous-Time Stochastic Processes
Stochastic Optimal control
IDO viewpoint and connection to our algorithms.
Complementary Material for \ref{['sec:method']}
Characterisation of the Aggregated Process $Y_t$ in \ref{['eqn:jnt_process']}
Case $d =1$:
Case $d > 1$:
...and 9 more sections

Figures (9)

Figure 1: A single sample generated with $3$ agents for the target $3$. Every agent controls one horizontal stripe ($$ coded) of the aggregated state $Y_t$. We show the state $X_0^{u,i}$ for every agent.
Figure 2: The schematic above illustrates linear stacking induced by a non-overlapping selection mask. For clarity, we do not use vectorised state representations.
Figure 3: Two Agents (joint): Aggregated state in a two-agent compositional diffusion setup with non-overlapping masking. Agent 1 and Agent 2 control the upper and lower halves of the image, respectively (see \ref{['fig:non-overlapping']}). The initial aggregated state, shown prior to optimisation, reveals the explicit split between the two components, highlighting how cooperative control progressively aligns independently generated trajectories into a unified global structure.
Figure 4: Two Agents (control-wise): Aggregated state in a two-agent compositional diffusion setup with non-overlapping masking. After 300 iterations of control-wise optimisation, the terminal reverse-time diffusion sample exhibits a semantically coherent digit emerging from coordinated agent dynamics.
Figure 5: Three Agents (joint): Aggregated state in a three-agent compositional diffusion setup with non-overlapping masking. Agent 1, Agent 2, Agent 3 control the upper, middle, and lower halves of the image, respectively.
...and 4 more figures

CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

TL;DR

Abstract

CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

Authors

TL;DR

Abstract

Table of Contents

Figures (9)