Multi-Agent Continuous Control with Generative Flow Networks

Shuang Luo; Yinchuan Li; Shunyu Liu; Xu Zhang; Yunfeng Shao; Chao Wu

Multi-Agent Continuous Control with Generative Flow Networks

Shuang Luo, Yinchuan Li, Shunyu Liu, Xu Zhang, Yunfeng Shao, Chao Wu

TL;DR

This work addresses exploration in cooperative multi-agent continuous control with sparse terminal rewards by extending Generative Flow Networks (GFlowNets) to a multi-agent setting. The proposed MACFN framework uses centralized training with decentralized execution and introduces a continuous flow decomposition network to map a global flow into agent-specific flows, enabling decentralized action selection while preserving a joint flow proportional to the reward. The authors establish theoretical consistency for the flow decomposition, employ a sampling-based flow matching objective, and demonstrate through experiments on MPE and MAMuJoCo that MACFN achieves superior performance and richer exploration than state-of-the-art baselines. The results highlight MACFN's potential as a principled, exploration-promoting alternative or complement to traditional reinforcement learning in multi-agent continuous control tasks, with code available online.

Abstract

Generative Flow Networks (GFlowNets) aim to generate diverse trajectories from a distribution in which the final states of the trajectories are proportional to the reward, serving as a powerful alternative to reinforcement learning for exploratory control tasks. However, the individual-flow matching constraint in GFlowNets limits their applications for multi-agent systems, especially continuous joint-control problems. In this paper, we propose a novel Multi-Agent generative Continuous Flow Networks (MACFN) method to enable multiple agents to perform cooperative exploration for various compositional continuous objects. Technically, MACFN trains decentralized individual-flow-based policies in a centralized global-flow-based matching fashion. During centralized training, MACFN introduces a continuous flow decomposition network to deduce the flow contributions of each agent in the presence of only global rewards. Then agents can deliver actions solely based on their assigned local flow in a decentralized way, forming a joint policy distribution proportional to the rewards. To guarantee the expressiveness of continuous flow decomposition, we theoretically derive a consistency condition on the decomposition network. Experimental results demonstrate that the proposed method yields results superior to the state-of-the-art counterparts and better exploration capability. Our code is available at https://github.com/isluoshuang/MACFN.

Multi-Agent Continuous Control with Generative Flow Networks

TL;DR

Abstract

Paper Structure (21 sections, 6 theorems, 57 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 6 theorems, 57 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Generative Flow Networks
Cooperative Multi-Agent Reinforcement Learning
Preliminaries
Dec-POMDP
GFlowNets
Methodology
MACFN: Theoretical Formulation
MACFN: Training Framework
Experiments
Environments
Settings
Results
Conclusion
...and 6 more sections

Key Result

Lemma 1

Let $\pi(\boldsymbol{a_t} \mid s_t)=\frac{F\left(s_t, \boldsymbol{a}_t\right)}{F\left(s_t\right)}$ denotes the joint policy, and $\pi_i\left(a_i \mid o_i\right)$ denotes the individual policy of agent $i$. Under Definition def-decomposition, we have

Figures (4)

Figure 1: The overall framework of MACFN. Left: We uniformly sample continuous actions for each agent and conduct flow estimation to obtain the individual flow distribution. Each agent can select actions according to its own flow distribution. Right: To realize flow decomposition, we first summarize the sampled flows of each agent to approximate the integral of individual flow distribution. Then we multiply the flow of each agent to obtain the global inflows (outflows). Middle: We update the overall flow networks by the continuous flow matching loss, i.e., inflows equal to outflows.
Figure 2: Visualization of different MPE scenarios, including (a) Robot-Navigation-Sparse, (b) Food-Collection-Sparse, and (c) Predator-Prey-Sparse.
Figure 3: Comparison results of IDDPG, MADDPG, COVDN, COMIX, FACMAC and MACFN on Robot-Navigation-Sparse ($N=2$), Food-Collection-Sparse ($N=3$), and Predator-Prey-Sparse ($N=3$) scenarios. Top: Average test return of different methods. Bottom: Number of distinctive trajectories during the training process.
Figure 4: Comparison results of IDDPG, MADDPG, COVDN, COMIX, FACMAC and MACFN on 2-Agent-Reacher-Sparse ($N=2$), 2-Agent-Swimmer-Sparse ($N=2$), and 3-Agent-Hopper-Sparse ($N=3$) scenarios. Top: Average test return of different methods. Bottom: Number of distinctive trajectories during the training process.

Theorems & Definitions (17)

Definition 1: Global Flow Decomposition
Lemma 1
Remark 1
Definition 2: Joint Continuous Outflows
Definition 3: Joint Continuous Inflows
Lemma 2
Lemma 3: Joint Continuous Flow Matching Condition
proof
Remark 2
Lemma 4
...and 7 more

Multi-Agent Continuous Control with Generative Flow Networks

TL;DR

Abstract

Multi-Agent Continuous Control with Generative Flow Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (17)