Multi-UAV Formation Control with Static and Dynamic Obstacle Avoidance via Reinforcement Learning
Yuqing Xie, Chao Yu, Hongzhi Zang, Feng Gao, Wenhao Tang, Jingyi Huang, Jiayu Chen, Botian Xu, Yi Wu, Yu Wang
TL;DR
The paper tackles multi-UAV formation control under static and dynamic obstacle avoidance in 3D space, addressing a large exploration space and sim-to-real gap. It proposes a two-stage RL framework that first scalarizes a four-objective reward via random weight exploration and then trains on complex tasks using curriculum learning, aided by an attention-based observation encoder. The contributions include a MO-Dec-POMDP based formulation, a four-component reward structure, a two-stage training strategy, and a novel attention-based encoder, yielding superior collision-free rates and formation maintenance over baselines and enabling zero-shot sim-to-real deployment. The work demonstrates practical impact by achieving robust real-world performance on Crazyflie 2.0 and highlights the potential for scalable, high-dimension UAV coordination in cluttered environments without extensive fine-tuning.
Abstract
This paper tackles the challenging task of maintaining formation among multiple unmanned aerial vehicles (UAVs) while avoiding both static and dynamic obstacles during directed flight. The complexity of the task arises from its multi-objective nature, the large exploration space, and the sim-to-real gap. To address these challenges, we propose a two-stage reinforcement learning (RL) pipeline. In the first stage, we randomly search for a reward function that balances key objectives: directed flight, obstacle avoidance, formation maintenance, and zero-shot policy deployment. The second stage applies this reward function to more complex scenarios and utilizes curriculum learning to accelerate policy training. Additionally, we incorporate an attention-based observation encoder to improve formation maintenance and adaptability to varying obstacle densities. Experimental results in both simulation and real-world environments demonstrate that our method outperforms both planning-based and RL-based baselines in terms of collision-free rates and formation maintenance across static, dynamic, and mixed obstacle scenarios. Ablation studies further confirm the effectiveness of our curriculum learning strategy and attention-based encoder. Animated demonstrations are available at: https://sites.google.com/view/ uav-formation-with-avoidance/.
