Table of Contents
Fetching ...

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

Daniel Rossi, Guido Borghi, Roberto Vezzani

TL;DR

TakuNet addresses the challenge of real-time, energy-efficient aerial image classification on embedded UAV hardware for emergency response. It presents a compact CNN that uses depthwise convolutions, an early stem, dense connections, and a Refiner, organized into a stem, four Taku stages, and a classifier, with FP16 training and TensorRT acceleration. Evaluations on AIDER and AIDERv2 demonstrate competitive accuracy with far fewer parameters and FLOPs, while real-world tests on Jetson Orin Nano and Raspberry Pi show strong throughput under constrained power budgets. The results highlight the importance of hardware-aware design for edge AI, and the work releases code for broad reproducibility and adoption in emergency-response applications.

Abstract

Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs for emergency responses. In this work, we introduce TakuNet, a novel light-weight architecture which employs techniques such as depth-wise convolutions and an early downsampling stem to reduce computational complexity while maintaining high accuracy. It leverages dense connections for fast convergence during training and uses 16-bit floating-point precision for optimization on embedded hardware accelerators. Experimental evaluation on two public datasets shows that TakuNet achieves near-state-of-the-art accuracy in classifying aerial images of emergency situations, despite its minimal parameter count. Real-world tests on embedded devices, namely Jetson Orin Nano and Raspberry Pi, confirm TakuNet's efficiency, achieving more than 650 fps on the 15W Jetson board, making it suitable for real-time AI processing on resource-constrained platforms and advancing the applicability of drones in emergency scenarios. The code and implementation details are publicly released.

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

TL;DR

TakuNet addresses the challenge of real-time, energy-efficient aerial image classification on embedded UAV hardware for emergency response. It presents a compact CNN that uses depthwise convolutions, an early stem, dense connections, and a Refiner, organized into a stem, four Taku stages, and a classifier, with FP16 training and TensorRT acceleration. Evaluations on AIDER and AIDERv2 demonstrate competitive accuracy with far fewer parameters and FLOPs, while real-world tests on Jetson Orin Nano and Raspberry Pi show strong throughput under constrained power budgets. The results highlight the importance of hardware-aware design for edge AI, and the work releases code for broad reproducibility and adoption in emergency-response applications.

Abstract

Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs for emergency responses. In this work, we introduce TakuNet, a novel light-weight architecture which employs techniques such as depth-wise convolutions and an early downsampling stem to reduce computational complexity while maintaining high accuracy. It leverages dense connections for fast convergence during training and uses 16-bit floating-point precision for optimization on embedded hardware accelerators. Experimental evaluation on two public datasets shows that TakuNet achieves near-state-of-the-art accuracy in classifying aerial images of emergency situations, despite its minimal parameter count. Real-world tests on embedded devices, namely Jetson Orin Nano and Raspberry Pi, confirm TakuNet's efficiency, achieving more than 650 fps on the 15W Jetson board, making it suitable for real-time AI processing on resource-constrained platforms and advancing the applicability of drones in emergency scenarios. The code and implementation details are publicly released.
Paper Structure (18 sections, 4 equations, 2 figures, 6 tables)

This paper contains 18 sections, 4 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Overview of the TakuNet architecture. The input image is first processed by a convolutional stem block. Four subsequent stages progressively extract spatial features, whose final output is later integrated along the channel axis with dense connection feature maps coming from the stage's input. At the end of each stage, the Downsampler block reduces the spatial size, while expanding the channel dimension. Finally, the Refiner block balances spatial features before the linear classification layer.
  • Figure 2: The variety of some of the images in the AIDER kyrkou2019deep dataset grouped by each class. In detail, from the left, collapsed buildings, fire/smoke, floods, traffic incidents and normal classes are shown.