Table of Contents
Fetching ...

On-device Self-supervised Learning of Visual Perception Tasks aboard Hardware-limited Nano-quadrotors

Elia Cereda, Manuele Rusci, Alessandro Giusti, Daniele Palossi

TL;DR

This work proposes, for the first time, on-device learning aboard nano-drones, where the first part of the in-field mission is dedicated to self-supervised finetuning of a pre-trained convolutional neural network (CNN).

Abstract

Sub-\SI{50}{\gram} nano-drones are gaining momentum in both academia and industry. Their most compelling applications rely on onboard deep learning models for perception despite severe hardware constraints (\ie sub-\SI{100}{\milli\watt} processor). When deployed in unknown environments not represented in the training data, these models often underperform due to domain shift. To cope with this fundamental problem, we propose, for the first time, on-device learning aboard nano-drones, where the first part of the in-field mission is dedicated to self-supervised fine-tuning of a pre-trained convolutional neural network (CNN). Leveraging a real-world vision-based regression task, we thoroughly explore performance-cost trade-offs of the fine-tuning phase along three axes: \textit{i}) dataset size (more data increases the regression performance but requires more memory and longer computation); \textit{ii}) methodologies (\eg fine-tuning all model parameters vs. only a subset); and \textit{iii}) self-supervision strategy. Our approach demonstrates an improvement in mean absolute error up to 30\% compared to the pre-trained baseline, requiring only \SI{22}{\second} fine-tuning on an ultra-low-power GWT GAP9 System-on-Chip. Addressing the domain shift problem via on-device learning aboard nano-drones not only marks a novel result for hardware-limited robots but lays the ground for more general advancements for the entire robotics community.

On-device Self-supervised Learning of Visual Perception Tasks aboard Hardware-limited Nano-quadrotors

TL;DR

This work proposes, for the first time, on-device learning aboard nano-drones, where the first part of the in-field mission is dedicated to self-supervised finetuning of a pre-trained convolutional neural network (CNN).

Abstract

Sub-\SI{50}{\gram} nano-drones are gaining momentum in both academia and industry. Their most compelling applications rely on onboard deep learning models for perception despite severe hardware constraints (\ie sub-\SI{100}{\milli\watt} processor). When deployed in unknown environments not represented in the training data, these models often underperform due to domain shift. To cope with this fundamental problem, we propose, for the first time, on-device learning aboard nano-drones, where the first part of the in-field mission is dedicated to self-supervised fine-tuning of a pre-trained convolutional neural network (CNN). Leveraging a real-world vision-based regression task, we thoroughly explore performance-cost trade-offs of the fine-tuning phase along three axes: \textit{i}) dataset size (more data increases the regression performance but requires more memory and longer computation); \textit{ii}) methodologies (\eg fine-tuning all model parameters vs. only a subset); and \textit{iii}) self-supervision strategy. Our approach demonstrates an improvement in mean absolute error up to 30\% compared to the pre-trained baseline, requiring only \SI{22}{\second} fine-tuning on an ultra-low-power GWT GAP9 System-on-Chip. Addressing the domain shift problem via on-device learning aboard nano-drones not only marks a novel result for hardware-limited robots but lays the ground for more general advancements for the entire robotics community.
Paper Structure (14 sections, 3 equations, 6 figures, 3 tables)

This paper contains 14 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: A) Use case: human pose estimation with a tiny perception CNN aboard a nano-UAV. B) Self-supervised on-device learning introduces a fine-tuning phase during the mission to improve navigation performances.
  • Figure 2: Reference frames of drone $\mathrm{D}$ and subject $\mathrm{H}$ at two timesteps $i$ and $j$ and their estimates from model predictions and drone odometry.
  • Figure 3: $R^2$ scores [%] for all combinations of fine-tuning and test subjects.
  • Figure 4: Fine-tuning set acquisition. Longer flight duration impacts performance more than a higher acquisition frame rate.
  • Figure 5: Comparison of fine-tuning methods. All methods improve w.r.t. the baseline, with consistent behavior across fine-tuning set sizes.
  • ...and 1 more figures