Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning
Xian Wang, Jin Zhou, Yuanli Feng, Jiahao Mei, Jiming Chen, Shuo Li
TL;DR
This work addresses time-optimal motion planning for multi-drone swarms under collision avoidance by learning decentralized policies with multi-agent reinforcement learning. It introduces a CTDE framework using Independent PPO (IPPO) with a shared policy and centralized critic, augmented by invalid-experience masking and value normalization, and employs a soft collision-free mechanism with a safety tolerance. The method leverages a DEC-POMDP formulation, a carefully designed four-term reward, and a simplified quadrotor model to enable online, onboard inference, with extensive simulations and real-world flights demonstrating near-time-optimal performance and low collision rates, including two- and five-quadrotor scenarios reaching up to 27.1 m/s in simulation and 13.65 m/s in real hardware. The results indicate strong potential for scalable, high-speed multi-drone operations in dynamic environments, while suggesting avenues for future enhancements such as temporal prediction, LiDAR-based sensing, and team-based coordination strategies.
Abstract
Recent innovations in autonomous drones have facilitated time-optimal flight in single-drone configurations, and enhanced maneuverability in multi-drone systems by applying optimal control and learning-based methods. However, few studies have achieved time-optimal motion planning for multi-drone systems, particularly during highly agile maneuvers or in dynamic scenarios. This paper presents a decentralized policy network using multi-agent reinforcement learning for time-optimal multi-drone flight. To strike a balance between flight efficiency and collision avoidance, we introduce a soft collision-free mechanism inspired by optimization-based methods. By customizing PPO in a centralized training, decentralized execution (CTDE) fashion, we unlock higher efficiency and stability in training while ensuring lightweight implementation. Extensive simulations show that, despite slight performance trade-offs compared to single-drone systems, our multi-drone approach maintains near-time-optimal performance with a low collision rate. Real-world experiments validate our method, with two quadrotors using the same network as in simulation achieving a maximum speed of 13.65 m/s and a maximum body rate of 13.4 rad/s in a 5.5 m * 5.5 m * 2.0 m space across various tracks, relying entirely on onboard computation.
