High-throughput Visual Nano-drone to Nano-drone Relative Localization using Onboard Fully Convolutional Networks
Luca Crupi, Alessandro Giusti, Daniele Palossi
TL;DR
The paper tackles onboard relative drone-to-drone localization for 10 cm nano-drones under strict resource limits. It introduces a vision-based FCNN that runs entirely on a GAP8-equipped Crazyflie to predict three 20×20 maps (u,v,d) from a 160×160 grayscale frame, with post-processing to recover image-space position and depth. The approach achieves 39 Hz inference at ~101 mW, outperforms three SoA methods on key regression metrics, and demonstrates 4-minute endurance with robust generalization to unseen environments, highlighting its practical viability for swarm operations without external infrastructure. Overall, the work delivers a lightweight, high-throughput onboard solution for multi-drone pose estimation that enables scalable, power-efficient swarm navigation and coordination.
Abstract
Relative drone-to-drone localization is a fundamental building block for any swarm operations. We address this task in the context of miniaturized nano-drones, i.e., 10cm in diameter, which show an ever-growing interest due to novel use cases enabled by their reduced form factor. The price for their versatility comes with limited onboard resources, i.e., sensors, processing units, and memory, which limits the complexity of the onboard algorithms. A traditional solution to overcome these limitations is represented by lightweight deep learning models directly deployed aboard nano-drones. This work tackles the challenging relative pose estimation between nano-drones using only a gray-scale low-resolution camera and an ultra-low-power System-on-Chip (SoC) hosted onboard. We present a vertically integrated system based on a novel vision-based fully convolutional neural network (FCNN), which runs at 39Hz within 101mW onboard a Crazyflie nano-drone extended with the GWT GAP8 SoC. We compare our FCNN against three State-of-the-Art (SoA) systems. Considering the best-performing SoA approach, our model results in an R-squared improvement from 32 to 47% on the horizontal image coordinate and from 18 to 55% on the vertical image coordinate, on a real-world dataset of 30k images. Finally, our in-field tests show a reduction of the average tracking error of 37% compared to a previous SoA work and an endurance performance up to the entire battery lifetime of 4 minutes.
