Table of Contents
Fetching ...

Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW

Elia Cereda, Alessandro Giusti, Daniele Palossi

TL;DR

The paper tackles domain shift in TinyML perception for nano-UAVs by enabling on-device fine-tuning using a self-supervised state-consistency loss. It introduces four parameter-update strategies and a hardware-aware embedded pipeline that performs full back-propagation on ultra-low-power GAP8/GAP9 platforms, using a 512-sample fine-tuning set and 5 epochs with 32-bit floating-point arithmetic. Offline results show substantial MAE gains, while in-field experiments demonstrate improved path tracking and a 26% reduction in horizontal position error compared to a non-finetuned state-of-the-art baseline, validating the practical impact of the approach. The work achieves real-time feasibility at power envelopes as low as 19 mW and memory footprints around 1 MB, marking the first real-world demonstration of on-device learning aboard a nano-UAV and enabling robust perception under challenging, unseen environments.

Abstract

Miniaturized cyber-physical systems (CPSes) powered by tiny machine learning (TinyML), such as nano-drones, are becoming an increasingly attractive technology. Their small form factor (i.e., ~10cm diameter) ensures vast applicability, ranging from the exploration of narrow disaster scenarios to safe human-robot interaction. Simple electronics make these CPSes inexpensive, but strongly limit the computational, memory, and sensing resources available on board. In real-world applications, these limitations are further exacerbated by domain shift. This fundamental machine learning problem implies that model perception performance drops when moving from the training domain to a different deployment one. To cope with and mitigate this general problem, we present a novel on-device fine-tuning approach that relies only on the limited ultra-low power resources available aboard nano-drones. Then, to overcome the lack of ground-truth training labels aboard our CPS, we also employ a self-supervised method based on ego-motion consistency. Albeit our work builds on top of a specific real-world vision-based human pose estimation task, it is widely applicable for many embedded TinyML use cases. Our 512-image on-device training procedure is fully deployed aboard an ultra-low power GWT GAP9 System-on-Chip and requires only 1MB of memory while consuming as low as 19mW or running in just 510ms (at 38mW). Finally, we demonstrate the benefits of our on-device learning approach by field-testing our closed-loop CPS, showing a reduction in horizontal position error of up to 26% vs. a non-fine-tuned state-of-the-art baseline. In the most challenging never-seen-before environment, our on-device learning procedure makes the difference between succeeding or failing the mission.

Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW

TL;DR

The paper tackles domain shift in TinyML perception for nano-UAVs by enabling on-device fine-tuning using a self-supervised state-consistency loss. It introduces four parameter-update strategies and a hardware-aware embedded pipeline that performs full back-propagation on ultra-low-power GAP8/GAP9 platforms, using a 512-sample fine-tuning set and 5 epochs with 32-bit floating-point arithmetic. Offline results show substantial MAE gains, while in-field experiments demonstrate improved path tracking and a 26% reduction in horizontal position error compared to a non-finetuned state-of-the-art baseline, validating the practical impact of the approach. The work achieves real-time feasibility at power envelopes as low as 19 mW and memory footprints around 1 MB, marking the first real-world demonstration of on-device learning aboard a nano-UAV and enabling robust perception under challenging, unseen environments.

Abstract

Miniaturized cyber-physical systems (CPSes) powered by tiny machine learning (TinyML), such as nano-drones, are becoming an increasingly attractive technology. Their small form factor (i.e., ~10cm diameter) ensures vast applicability, ranging from the exploration of narrow disaster scenarios to safe human-robot interaction. Simple electronics make these CPSes inexpensive, but strongly limit the computational, memory, and sensing resources available on board. In real-world applications, these limitations are further exacerbated by domain shift. This fundamental machine learning problem implies that model perception performance drops when moving from the training domain to a different deployment one. To cope with and mitigate this general problem, we present a novel on-device fine-tuning approach that relies only on the limited ultra-low power resources available aboard nano-drones. Then, to overcome the lack of ground-truth training labels aboard our CPS, we also employ a self-supervised method based on ego-motion consistency. Albeit our work builds on top of a specific real-world vision-based human pose estimation task, it is widely applicable for many embedded TinyML use cases. Our 512-image on-device training procedure is fully deployed aboard an ultra-low power GWT GAP9 System-on-Chip and requires only 1MB of memory while consuming as low as 19mW or running in just 510ms (at 38mW). Finally, we demonstrate the benefits of our on-device learning approach by field-testing our closed-loop CPS, showing a reduction in horizontal position error of up to 26% vs. a non-fine-tuned state-of-the-art baseline. In the most challenging never-seen-before environment, our on-device learning procedure makes the difference between succeeding or failing the mission.
Paper Structure (15 sections, 6 equations, 11 figures, 7 tables)

This paper contains 15 sections, 6 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Human pose estimation task and reference frames cereda2022pitchaug.
  • Figure 2: Robot platform: Crazyflie 2.1 with AI-deck and Flow-deck boards.
  • Figure 3: The parallel ultra-low power System-on-Chip architecture (PULP). Cluster cores perform parallel computationally intensive workloads, while a fabric controller orchestrates data transfers through two direct memory access (DMA) units. Optional floating-point units (FPU) are shared in the cluster.
  • Figure 4: PULP-Frontnet pulp-frontnet, our target CNN architecture with 9 layers and 304k parameters. Inference requires 14.3M operations per frame.
  • Figure 5: Loss functions: A) task loss, B) original state-consistency loss nava2021uncertainty, and C-D) our state-consistency loss with uncertain drone odometry and moving subject. Subject movements are either C) known or D) unknown (subject assumed still). The depicted reference frames represent drone $\mathrm{D}$ and subject $\mathrm{H}$ ground-truth poses, at times $i$ and $j$, and the corresponding relative poses estimated by, respectively, drone odometry $\mathrm{\hat{D}}$ and model predictions $\mathrm{H^*}$.
  • ...and 6 more figures