Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning

Zhehui Huang; Zhaojing Yang; Rahul Krupani; Baskın Şenbaşlar; Sumeet Batra; Gaurav S. Sukhatme

Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning

Zhehui Huang, Zhaojing Yang, Rahul Krupani, Baskın Şenbaşlar, Sumeet Batra, Gaurav S. Sukhatme

TL;DR

This work tackles the challenge of collision avoidance for quadrotor swarms in obstacle-dense environments by introducing an end-to-end decentralized DRL policy. It combines SDF-based obstacle observations, a multi-head attention mechanism, and a novel replay strategy to train policies that generalize to unseen layouts and transfer zero-shot to real quadrotors. The approach scales to 32 robots in simulation with 80% obstacle density and demonstrates competitive performance against state-of-the-art baselines while offering substantially faster inference on compute-limited hardware, including a successful 0.35 ms onboard run on Crazyflie 2.1. The results suggest practical impact for rapid, scalable swarm navigation in cluttered spaces, with future work extending dynamic obstacles, onboard sensing, and formal safety guarantees.

Abstract

End-to-end deep reinforcement learning (DRL) for quadrotor control promises many benefits -- easy deployment, task generalization and real-time execution capability. Prior end-to-end DRL-based methods have showcased the ability to deploy learned controllers onto single quadrotors or quadrotor teams maneuvering in simple, obstacle-free environments. However, the addition of obstacles increases the number of possible interactions exponentially, thereby increasing the difficulty of training RL policies. In this work, we propose an end-to-end DRL approach to control quadrotor swarms in environments with obstacles. We provide our agents a curriculum and a replay buffer of the clipped collision episodes to improve performance in obstacle-rich environments. We implement an attention mechanism to attend to the neighbor robots and obstacle interactions - the first successful demonstration of this mechanism on policies for swarm behavior deployed on severely compute-constrained hardware. Our work is the first work that demonstrates the possibility of learning neighbor-avoiding and obstacle-avoiding control policies trained with end-to-end DRL that transfers zero-shot to real quadrotors. Our approach scales to 32 robots with 80% obstacle density in simulation and 8 robots with 20% obstacle density in physical deployment. Video demonstrations are available on the project website at: https://sites.google.com/view/obst-avoid-swarm-rl.

Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure (20 sections, 12 figures, 2 tables)

This paper contains 20 sections, 12 figures, 2 tables.

Introduction
Related Work
Method
Problem Formulation
Training Setup
Model architecture
Replay Buffer
Experiments and results
Ablation study
Analyzing reward functions
Scaling
Number of robots
Number of sensed neighbor robots
Obstacle density
Obstacle size
...and 5 more sections

Figures (12)

Figure 1: System overview. There are $N$ robots in the environment, and the green cylinders represent obstacles. At every tick, each robot collects its own local observations from the environment and computes its actions independently $\emph{e.g.,}$ red shadows in the stacked local observations denote the local observations of the red robot. Our learned policy is effective in simulated trials, scales, and can be transferred to physical, severely compute-constrained quadrotors.
Figure 2: Model architecture
Figure 3: Ablation study: We remove components one-by-one in order to show the necessity of various parts.
Figure 4: Comparison of our replay strategy with prioritized level replay.
Figure 5: Comparison of K-nearest neighbor observations with range-based neighbor observations. $K=2$ and range = $4m$.
...and 7 more figures

Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning

TL;DR

Abstract

Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)