Multi-Agent Continuous Control with Generative Flow Networks
Shuang Luo, Yinchuan Li, Shunyu Liu, Xu Zhang, Yunfeng Shao, Chao Wu
TL;DR
This work addresses exploration in cooperative multi-agent continuous control with sparse terminal rewards by extending Generative Flow Networks (GFlowNets) to a multi-agent setting. The proposed MACFN framework uses centralized training with decentralized execution and introduces a continuous flow decomposition network to map a global flow into agent-specific flows, enabling decentralized action selection while preserving a joint flow proportional to the reward. The authors establish theoretical consistency for the flow decomposition, employ a sampling-based flow matching objective, and demonstrate through experiments on MPE and MAMuJoCo that MACFN achieves superior performance and richer exploration than state-of-the-art baselines. The results highlight MACFN's potential as a principled, exploration-promoting alternative or complement to traditional reinforcement learning in multi-agent continuous control tasks, with code available online.
Abstract
Generative Flow Networks (GFlowNets) aim to generate diverse trajectories from a distribution in which the final states of the trajectories are proportional to the reward, serving as a powerful alternative to reinforcement learning for exploratory control tasks. However, the individual-flow matching constraint in GFlowNets limits their applications for multi-agent systems, especially continuous joint-control problems. In this paper, we propose a novel Multi-Agent generative Continuous Flow Networks (MACFN) method to enable multiple agents to perform cooperative exploration for various compositional continuous objects. Technically, MACFN trains decentralized individual-flow-based policies in a centralized global-flow-based matching fashion. During centralized training, MACFN introduces a continuous flow decomposition network to deduce the flow contributions of each agent in the presence of only global rewards. Then agents can deliver actions solely based on their assigned local flow in a decentralized way, forming a joint policy distribution proportional to the rewards. To guarantee the expressiveness of continuous flow decomposition, we theoretically derive a consistency condition on the decomposition network. Experimental results demonstrate that the proposed method yields results superior to the state-of-the-art counterparts and better exploration capability. Our code is available at https://github.com/isluoshuang/MACFN.
