Table of Contents
Fetching ...

Learning Efficient Flocking Control based on Gibbs Random Fields

Dengyu Zhang, Chenghao, Feng Xue, Qingrui Zhang

TL;DR

The paper addresses scalable, safe, and efficient distributed flocking for multi-robot systems in congested environments by formulating flocking as a GRF-based MARL problem. It introduces a decentralized training/execution (DTDE) scheme through GRF-based credit assignment, and an action attention module that enables implicit motion-intention anticipation via mean-field-inspired attention. A structured energy-based reward $r=\exp[-H(X)]$ combining unary and pairwise terms guides learning, while local rewards preserve global optima through a decoupled pairwise energy $\hat{H}_p$ and a PPO-based policy optimization. Results show $\approx 99\%$ success in simulations and real-world experiments, with ablation studies confirming the value of credit assignment and action attention for performance and safety.

Abstract

Flocking control is essential for multi-robot systems in diverse applications, yet achieving efficient flocking in congested environments poses challenges regarding computation burdens, performance optimality, and motion safety. This paper addresses these challenges through a multi-agent reinforcement learning (MARL) framework built on Gibbs Random Fields (GRFs). With GRFs, a multi-robot system is represented by a set of random variables conforming to a joint probability distribution, thus offering a fresh perspective on flocking reward design. A decentralized training and execution mechanism, which enhances the scalability of MARL concerning robot quantity, is realized using a GRF-based credit assignment method. An action attention module is introduced to implicitly anticipate the motion intentions of neighboring robots, consequently mitigating potential non-stationarity issues in MARL. The proposed framework enables learning an efficient distributed control policy for multi-robot systems in challenging environments with success rate around $99\%$, as demonstrated through thorough comparisons with state-of-the-art solutions in simulations and experiments. Ablation studies are also performed to validate the efficiency of different framework modules.

Learning Efficient Flocking Control based on Gibbs Random Fields

TL;DR

The paper addresses scalable, safe, and efficient distributed flocking for multi-robot systems in congested environments by formulating flocking as a GRF-based MARL problem. It introduces a decentralized training/execution (DTDE) scheme through GRF-based credit assignment, and an action attention module that enables implicit motion-intention anticipation via mean-field-inspired attention. A structured energy-based reward combining unary and pairwise terms guides learning, while local rewards preserve global optima through a decoupled pairwise energy and a PPO-based policy optimization. Results show success in simulations and real-world experiments, with ablation studies confirming the value of credit assignment and action attention for performance and safety.

Abstract

Flocking control is essential for multi-robot systems in diverse applications, yet achieving efficient flocking in congested environments poses challenges regarding computation burdens, performance optimality, and motion safety. This paper addresses these challenges through a multi-agent reinforcement learning (MARL) framework built on Gibbs Random Fields (GRFs). With GRFs, a multi-robot system is represented by a set of random variables conforming to a joint probability distribution, thus offering a fresh perspective on flocking reward design. A decentralized training and execution mechanism, which enhances the scalability of MARL concerning robot quantity, is realized using a GRF-based credit assignment method. An action attention module is introduced to implicitly anticipate the motion intentions of neighboring robots, consequently mitigating potential non-stationarity issues in MARL. The proposed framework enables learning an efficient distributed control policy for multi-robot systems in challenging environments with success rate around , as demonstrated through thorough comparisons with state-of-the-art solutions in simulations and experiments. Ablation studies are also performed to validate the efficiency of different framework modules.

Paper Structure

This paper contains 21 sections, 2 theorems, 18 equations, 10 figures.

Key Result

Proposition 1

The decoupled pairwise energy $\hat{H}_p$ given in eq:decoupedPairwise shares the same minimum condition and minimum value with the pairwise energy given in eq:normalized_energy.

Figures (10)

  • Figure 1: Two flocks, each with 7 drones, move in opposite directions and avoid collisions in shared space using the proposed RL-based flocking controller.
  • Figure 2: (a): Discrete action set $\mathcal{A}_{i,d}$ indicating acceleration vectors that robots can choose. (b): Obstacle observation $\boldsymbol{o}_{i,o}$ consists of the distances to obstacles in $l$ evenly divided sectors. The distances are defined as the minimum radius of the sector that doesn't cover any obstacles.
  • Figure 3: Configurations with (a) 14 edges, (b) 18 edges. The red robot in (b) tends to collide with obstacles due to attractions from extra neighbors.
  • Figure 4: An action attention structure is designed for distributed policies. The attention weight $\alpha_j$ evaluating the importance of each neighbor is computed by both neighbor observation $\boldsymbol{o}_{ij}$ and neighbor action distribution $\boldsymbol{A}_j$.
  • Figure 5: A validation environment with $50$ obstacles, in which robots move from the left to the right by six different algorithms: (a) Olfati-saber, (b) Vásárhelyi, (c) CFDC, (d) DMPC, (e) PPO, (f) PPO-AA.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Lemma 1: Mean-field approximation koller_probabilistic_2009