Multi-agent Coordination via Flow Matching
Dongsu Lee, Daehee Lee, Amy Zhang
TL;DR
This paper introduces MAC-Flow, an offline MARL framework that jointly models multi-agent coordination and computational efficiency. It first learns a flow-based joint policy via flow matching on offline data, then distills this into decentralized one-step policies guided by the IGM principle and Q-learning, enabling fast, real-time execution. The approach achieves strong coordination across discrete and continuous benchmarks, with approximately $\times14.5$ faster inference than diffusion-based MARL methods while maintaining competitive performance, and exhibits compatibility with online fine-tuning. Theoretical guarantees connect distillation quality to the policy and value gaps, and empirical results demonstrate substantial speedups with minimal loss in effectiveness, highlighting practical benefits for scalable multi-agent systems.
Abstract
This work presents MAC-Flow, a simple yet expressive framework for multi-agent coordination. We argue that requirements of effective coordination are twofold: (i) a rich representation of the diverse joint behaviors present in offline data and (ii) the ability to act efficiently in real time. However, prior approaches often sacrifice one for the other, i.e., denoising diffusion-based solutions capture complex coordination but are computationally slow, while Gaussian policy-based solutions are fast but brittle in handling multi-agent interaction. MAC-Flow addresses this trade-off by first learning a flow-based representation of joint behaviors, and then distilling it into decentralized one-step policies that preserve coordination while enabling fast execution. Across four different benchmarks, including $12$ environments and $34$ datasets, MAC-Flow alleviates the trade-off between performance and computational cost, specifically achieving about $\boldsymbol{\times14.5}$ faster inference compared to diffusion-based MARL methods, while maintaining good performance. At the same time, its inference speed is similar to that of prior Gaussian policy-based offline multi-agent reinforcement learning (MARL) methods.
