LEARN: Learning End-to-End Aerial Resource-Constrained Multi-Robot Navigation
Darren Chiu, Zhehui Huang, Ruohai Ge, Gaurav S. Sukhatme
TL;DR
This work tackles onboard, resource-constrained multi-UAV navigation in cluttered environments by introducing LEARN, a lightweight two-stage safety-guided reinforcement learning framework. LEARN combines a minimal planning cue with an attention-based policy and a two-stage safety reward that leverages control barrier concepts, enabling fully onboard perception, planning, and control without external infrastructure. In simulation, LEARN outperforms two state-of-the-art planners by about 10% while using far fewer resources, and it scales to 6 real Crazyflie quads with speeds up to $2.0\ \mathrm{m/s}$ and through $0.2\ \mathrm{m}$ gaps, with zero-shot sim-to-real transfer demonstrated in diverse indoor/outdoor settings. The approach is highly bandwidth- and compute-efficient, requiring only local neighbor data and low-dimensional ToF obstacle sensing, and it supports robust operation under communication delays, making it practical for scalable nano-UAV swarms and real-world deployment.
Abstract
Nano-UAV teams offer great agility yet face severe navigation challenges due to constrained onboard sensing, communication, and computation. Existing approaches rely on high-resolution vision or compute-intensive planners, rendering them infeasible for these platforms. We introduce LEARN, a lightweight, two-stage safety-guided reinforcement learning (RL) framework for multi-UAV navigation in cluttered spaces. Our system combines low-resolution Time-of-Flight (ToF) sensors and a simple motion planner with a compact, attention-based RL policy. In simulation, LEARN outperforms two state-of-the-art planners by $10\%$ while using substantially fewer resources. We demonstrate LEARN's viability on six Crazyflie quadrotors, achieving fully onboard flight in diverse indoor and outdoor environments at speeds up to $2.0 m/s$ and traversing $0.2 m$ gaps.
