Table of Contents
Fetching ...

Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation

Tushar Prasanna Swaminathan, Christopher Silver, Thangarajah Akilan

TL;DR

This study benchmarks deep learning models on the NVIDIA Jetson Nano to assess real-time inference performance after hardware-specific optimization with TensorRT. It employs image classification and four-way human action recognition tasks, converting PyTorch models to TensorRT engines via an ONNX-based pipeline to evaluate pre- and post-optimization speeds. The results show substantial average speedups, with a FLOPS-dependent trend where lower-FLOPS models benefit more, though some custom architectures exhibit exceptions. The work emphasizes hardware-aware optimization for sustainable, scalable edge AI deployment and points to future work in quantization-aware training and pruning to push further efficiency gains.

Abstract

The proliferation of complex deep learning (DL) models has revolutionized various applications, including computer vision-based solutions, prompting their integration into real-time systems. However, the resource-intensive nature of these models poses challenges for deployment on low-computational power and low-memory devices, like embedded and edge devices. This work empirically investigates the optimization of such complex DL models to analyze their functionality on an embedded device, particularly on the NVIDIA Jetson Nano. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection. The experimental results reveal that, on average, optimized models exhibit a 16.11% speed improvement over their non-optimized counterparts. This not only emphasizes the critical need to consider hardware constraints and environmental sustainability in model development and deployment but also underscores the pivotal role of model optimization in enabling the widespread deployment of AI-assisted technologies on resource-constrained computational systems. It also serves as proof that prioritizing hardware-specific model optimization leads to efficient and scalable solutions that substantially decrease energy consumption and carbon footprint.

Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation

TL;DR

This study benchmarks deep learning models on the NVIDIA Jetson Nano to assess real-time inference performance after hardware-specific optimization with TensorRT. It employs image classification and four-way human action recognition tasks, converting PyTorch models to TensorRT engines via an ONNX-based pipeline to evaluate pre- and post-optimization speeds. The results show substantial average speedups, with a FLOPS-dependent trend where lower-FLOPS models benefit more, though some custom architectures exhibit exceptions. The work emphasizes hardware-aware optimization for sustainable, scalable edge AI deployment and points to future work in quantization-aware training and pruning to push further efficiency gains.

Abstract

The proliferation of complex deep learning (DL) models has revolutionized various applications, including computer vision-based solutions, prompting their integration into real-time systems. However, the resource-intensive nature of these models poses challenges for deployment on low-computational power and low-memory devices, like embedded and edge devices. This work empirically investigates the optimization of such complex DL models to analyze their functionality on an embedded device, particularly on the NVIDIA Jetson Nano. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection. The experimental results reveal that, on average, optimized models exhibit a 16.11% speed improvement over their non-optimized counterparts. This not only emphasizes the critical need to consider hardware constraints and environmental sustainability in model development and deployment but also underscores the pivotal role of model optimization in enabling the widespread deployment of AI-assisted technologies on resource-constrained computational systems. It also serves as proof that prioritizing hardware-specific model optimization leads to efficient and scalable solutions that substantially decrease energy consumption and carbon footprint.

Paper Structure

This paper contains 10 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Layout of the NVIDIA Jetson Nano Developer Kit showcasing its key components and connectivity options JetsonNano.
  • Figure 2: The PyTorch deep learning model optimization process for a NVIDIA Jetson Nano Edge Device using TensorRT.
  • Figure 3: Inference process of TensorRT engine on NVIDIA Jetson Nano.
  • Figure 4: Inference time speedup of the optimized models on NVIDIA Jetson Nano compared to their non-optimized baseline counterparts.