Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones

Beatrice Alessandra Motetti; Luca Crupi; Mustafa Omer Mohammed Elamin Elshaigi; Matteo Risso; Daniele Jahier Pagliari; Daniele Palossi; Alessio Burrello

Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones

Beatrice Alessandra Motetti, Luca Crupi, Mustafa Omer Mohammed Elamin Elshaigi, Matteo Risso, Daniele Jahier Pagliari, Daniele Palossi, Alessio Burrello

TL;DR

The paper tackles onboard visual pose estimation for nano-drones under stringent power and memory limits by introducing adaptive inference. It builds two ensembles from three state-of-the-art CNNs and leverages three runtime policies—Output-based Partitioning (OP) and two auxiliary-task policies (Aux-SM, Aux-HLC)—to dynamically adjust computation. Experimental results on known and unseen datasets show substantial latency reductions (up to 28% at iso-MAE) and modest MAE improvements, with successful deployment on Crazyflie 2.1 using GAP8 and 8-bit quantization, confirming practical, energy-efficient onboard perception. The work broadens the feasible operating envelope for tiny drones by providing a flexible, runtime-tunable trade-off between accuracy and efficiency, and points toward further advances in adaptive, edge-optimized perception for autonomous micro-aerial systems.

Abstract

Sub-10cm diameter nano-drones are gaining momentum thanks to their applicability in scenarios prevented to bigger flying drones, such as in narrow environments and close to humans. However, their tiny form factor also brings their major drawback: ultra-constrained memory and processors for the onboard execution of their perception pipelines. Therefore, lightweight deep learning-based approaches are becoming increasingly popular, stressing how computational efficiency and energy-saving are paramount as they can make the difference between a fully working closed-loop system and a failing one. In this work, to maximize the exploitation of the ultra-limited resources aboard nano-drones, we present a novel adaptive deep learning-based mechanism for the efficient execution of a vision-based human pose estimation task. We leverage two State-of-the-Art (SoA) convolutional neural networks (CNNs) with different regression performance vs. computational costs trade-offs. By combining these CNNs with three novel adaptation strategies based on the output's temporal consistency and on auxiliary tasks to swap the CNN being executed proactively, we present six different systems. On a real-world dataset and the actual nano-drone hardware, our best-performing system, compared to executing only the bigger and most accurate SoA model, shows 28% latency reduction while keeping the same mean absolute error (MAE), 3% MAE reduction while being iso-latency, and the absolute peak performance, i.e., 6% better than SoA model.

Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones

TL;DR

Abstract

Paper Structure (17 sections, 4 equations, 6 figures, 2 tables)

This paper contains 17 sections, 4 equations, 6 figures, 2 tables.

Introduction
Background & Related Work
Human Pose Estimation aboard Nano-Drones
Adaptive Machine Learning
Materials & Methods
Static Neural Networks
Adaptive Inference for Visual Pose Estimation
Output-based Partitioning
Auxiliary Task-based Partitioning
Target Platform
Experimental Results
Setup
Auxiliary Task Exploration
Output vs. Auxiliary Task Policies
Crazyflie Deployment
...and 2 more sections

Figures (6)

Figure 1: OP policy: the small model is always executed to compute a first set of outputs, that are possibly corrected by the big model if they differ significantly from the prediction at the previous time stamp.
Figure 2: Auxiliary task-based partitioning. (a) A first auxiliary CNN is executed to localize the head, and one of the two policies is applied. (b) Based on the policy outcome, one of the two models of the ensemble is executed.
Figure 3: $8\times$6 grid division of the input images. In each quadrant, the difference between the MAE of F$^1$ and M$^{1.0}$ is reported. A green square marks the quadrant to which the head of the person belongs.
Figure 4: Auxiliary task-based policies comparison on the Known dataset.
Figure 5: OP and Aux policies comparison on the Known dataset.
...and 1 more figures

Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones

TL;DR

Abstract

Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones

Authors

TL;DR

Abstract

Table of Contents

Figures (6)