Learning Efficient Flocking Control based on Gibbs Random Fields
Dengyu Zhang, Chenghao, Feng Xue, Qingrui Zhang
TL;DR
The paper addresses scalable, safe, and efficient distributed flocking for multi-robot systems in congested environments by formulating flocking as a GRF-based MARL problem. It introduces a decentralized training/execution (DTDE) scheme through GRF-based credit assignment, and an action attention module that enables implicit motion-intention anticipation via mean-field-inspired attention. A structured energy-based reward $r=\exp[-H(X)]$ combining unary and pairwise terms guides learning, while local rewards preserve global optima through a decoupled pairwise energy $\hat{H}_p$ and a PPO-based policy optimization. Results show $\approx 99\%$ success in simulations and real-world experiments, with ablation studies confirming the value of credit assignment and action attention for performance and safety.
Abstract
Flocking control is essential for multi-robot systems in diverse applications, yet achieving efficient flocking in congested environments poses challenges regarding computation burdens, performance optimality, and motion safety. This paper addresses these challenges through a multi-agent reinforcement learning (MARL) framework built on Gibbs Random Fields (GRFs). With GRFs, a multi-robot system is represented by a set of random variables conforming to a joint probability distribution, thus offering a fresh perspective on flocking reward design. A decentralized training and execution mechanism, which enhances the scalability of MARL concerning robot quantity, is realized using a GRF-based credit assignment method. An action attention module is introduced to implicitly anticipate the motion intentions of neighboring robots, consequently mitigating potential non-stationarity issues in MARL. The proposed framework enables learning an efficient distributed control policy for multi-robot systems in challenging environments with success rate around $99\%$, as demonstrated through thorough comparisons with state-of-the-art solutions in simulations and experiments. Ablation studies are also performed to validate the efficiency of different framework modules.
