Table of Contents
Fetching ...

A Map-free Deep Learning-based Framework for Gate-to-Gate Monocular Visual Navigation aboard Miniaturized Aerial Vehicles

Lorenzo Scarciglia, Antonio Paolillo, Daniele Palossi

TL;DR

This work tackles gate-to-gate monocular navigation for ultra-light nano-drones under severe on-board constraints. It introduces a map-free pipeline with a lightweight DL gate-detection frontend (two variants) coupled to an image-based visual servoing backend, deployed entirely on-board on a Crazyflie 2.1 platform with GAP8 and STM32. Through sim-to-real training, quantization, and careful hardware-software partitioning, the system achieves real-time performance (up to 30 Hz perception) and accurate gate localization (RMSE around 1.4 px) while delivering robust field performance: up to 15 gates, ~100 m total travel, and peak speeds near 1.9 m/s, including generalization to unseen environments. The results demonstrate a practical, first-of-its-kind monocular, map-free gate navigation capability on nano-drones, with clear avenues for obstacle avoidance integration and joint perception-control architectures.

Abstract

Palm-sized autonomous nano-drones, i.e., sub-50g in weight, recently entered the drone racing scenario, where they are tasked to avoid obstacles and navigate as fast as possible through gates. However, in contrast with their bigger counterparts, i.e., kg-scale drones, nano-drones expose three orders of magnitude less onboard memory and compute power, demanding more efficient and lightweight vision-based pipelines to win the race. This work presents a map-free vision-based (using only a monocular camera) autonomous nano-drone that combines a real-time deep learning gate detection front-end with a classic yet elegant and effective visual servoing control back-end, only relying on onboard resources. Starting from two state-of-the-art tiny deep learning models, we adapt them for our specific task, and after a mixed simulator-real-world training, we integrate and deploy them aboard our nano-drone. Our best-performing pipeline costs of only 24M multiply-accumulate operations per frame, resulting in a closed-loop control performance of 30 Hz, while achieving a gate detection root mean square error of 1.4 pixels, on our ~20k real-world image dataset. In-field experiments highlight the capability of our nano-drone to successfully navigate through 15 gates in 4 min, never crashing and covering a total travel distance of ~100m, with a peak flight speed of 1.9 m/s. Finally, to stress the generalization capability of our system, we also test it in a never-seen-before environment, where it navigates through gates for more than 4 min.

A Map-free Deep Learning-based Framework for Gate-to-Gate Monocular Visual Navigation aboard Miniaturized Aerial Vehicles

TL;DR

This work tackles gate-to-gate monocular navigation for ultra-light nano-drones under severe on-board constraints. It introduces a map-free pipeline with a lightweight DL gate-detection frontend (two variants) coupled to an image-based visual servoing backend, deployed entirely on-board on a Crazyflie 2.1 platform with GAP8 and STM32. Through sim-to-real training, quantization, and careful hardware-software partitioning, the system achieves real-time performance (up to 30 Hz perception) and accurate gate localization (RMSE around 1.4 px) while delivering robust field performance: up to 15 gates, ~100 m total travel, and peak speeds near 1.9 m/s, including generalization to unseen environments. The results demonstrate a practical, first-of-its-kind monocular, map-free gate navigation capability on nano-drones, with clear avenues for obstacle avoidance integration and joint perception-control architectures.

Abstract

Palm-sized autonomous nano-drones, i.e., sub-50g in weight, recently entered the drone racing scenario, where they are tasked to avoid obstacles and navigate as fast as possible through gates. However, in contrast with their bigger counterparts, i.e., kg-scale drones, nano-drones expose three orders of magnitude less onboard memory and compute power, demanding more efficient and lightweight vision-based pipelines to win the race. This work presents a map-free vision-based (using only a monocular camera) autonomous nano-drone that combines a real-time deep learning gate detection front-end with a classic yet elegant and effective visual servoing control back-end, only relying on onboard resources. Starting from two state-of-the-art tiny deep learning models, we adapt them for our specific task, and after a mixed simulator-real-world training, we integrate and deploy them aboard our nano-drone. Our best-performing pipeline costs of only 24M multiply-accumulate operations per frame, resulting in a closed-loop control performance of 30 Hz, while achieving a gate detection root mean square error of 1.4 pixels, on our ~20k real-world image dataset. In-field experiments highlight the capability of our nano-drone to successfully navigate through 15 gates in 4 min, never crashing and covering a total travel distance of ~100m, with a peak flight speed of 1.9 m/s. Finally, to stress the generalization capability of our system, we also test it in a never-seen-before environment, where it navigates through gates for more than 4 min.

Paper Structure

This paper contains 12 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: A) Our autonomous nano-drone. B) Vision-based gate-to-gate navigation using the prediction of the four corners (stars).
  • Figure 2: Examples of images collected in the simulator (A), with photometric augmentations (B), and in-field (C).
  • Figure 3: Our closed-loop vision-based pipeline. On the GAP8, we execute either the CNN or the FCNN; then, the 4 corner predictions are forwarded to the IBVS high-level controller, running on the STM32, which computes the desired velocities for the low-level PID cascade.
  • Figure 4: Sample trajectories of the first in-field experiment, either employing the CNN (A) or the FCNN model (B). In (C), we show an example of the second in-field experiment where the nano-drone faces the gate from three different initial positions.
  • Figure 5: Footage of the in-field experiments, either in our laboratory (A), i.e., the fine-tuning environment, or in a never-seen-before room (B).