Table of Contents
Fetching ...

Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones

Matteo Risso, Francesco Daghero, Beatrice Alessandra Motetti, Daniele Jahier Pagliari, Enrico Macii, Massimo Poncino, Alessio Burrello

TL;DR

This work tackles visual pose estimation on nano-drones under severe hardware constraints by proposing a two-stage NAS cascade (layer-selection via SuperNet/DARTS followed by PIT pruning) and a fused Pointwise-Depthwise kernel, all mapped to a GAP8-based nano-UAV platform. The approach searches a broad architectural space, applies fine-grained pruning, and deploys optimized fused kernels to minimize memory transfers and latency. Results show up to 13.78% MAE improvement over state-of-the-art and up to 3.22× end-to-end latency reduction at iso-MAE, with the smallest model achieving substantial parameter and memory reductions. The work demonstrates the practicality and importance of deployment-aware optimization pipelines for TinyML on ultra-compact drones, enabling faster perception to benefit control loops in indoor and human-facing tasks.

Abstract

Miniaturized autonomous unmanned aerial vehicles (UAVs) are gaining popularity due to their small size, enabling new tasks such as indoor navigation or people monitoring. Nonetheless, their size and simple electronics pose severe challenges in implementing advanced onboard intelligence. This work proposes a new automatic optimization pipeline for visual pose estimation tasks using Deep Neural Networks (DNNs). The pipeline leverages two different Neural Architecture Search (NAS) algorithms to pursue a vast complexity-driven exploration in the DNNs' architectural space. The obtained networks are then deployed on an off-the-shelf nano-drone equipped with a parallel ultra-low power System-on-Chip leveraging a set of novel software kernels for the efficient fused execution of critical DNN layer sequences. Our results improve the state-of-the-art reducing inference latency by up to 3.22x at iso-error.

Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones

TL;DR

This work tackles visual pose estimation on nano-drones under severe hardware constraints by proposing a two-stage NAS cascade (layer-selection via SuperNet/DARTS followed by PIT pruning) and a fused Pointwise-Depthwise kernel, all mapped to a GAP8-based nano-UAV platform. The approach searches a broad architectural space, applies fine-grained pruning, and deploys optimized fused kernels to minimize memory transfers and latency. Results show up to 13.78% MAE improvement over state-of-the-art and up to 3.22× end-to-end latency reduction at iso-MAE, with the smallest model achieving substantial parameter and memory reductions. The work demonstrates the practicality and importance of deployment-aware optimization pipelines for TinyML on ultra-compact drones, enabling faster perception to benefit control loops in indoor and human-facing tasks.

Abstract

Miniaturized autonomous unmanned aerial vehicles (UAVs) are gaining popularity due to their small size, enabling new tasks such as indoor navigation or people monitoring. Nonetheless, their size and simple electronics pose severe challenges in implementing advanced onboard intelligence. This work proposes a new automatic optimization pipeline for visual pose estimation tasks using Deep Neural Networks (DNNs). The pipeline leverages two different Neural Architecture Search (NAS) algorithms to pursue a vast complexity-driven exploration in the DNNs' architectural space. The obtained networks are then deployed on an off-the-shelf nano-drone equipped with a parallel ultra-low power System-on-Chip leveraging a set of novel software kernels for the efficient fused execution of critical DNN layer sequences. Our results improve the state-of-the-art reducing inference latency by up to 3.22x at iso-error.
Paper Structure (7 sections, 2 figures, 1 table)

This paper contains 7 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: NAS-based optimization flow (left); Optimized PW+DW kernel (right).
  • Figure 2: Optimized architectures vs SotA from date2024.