Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation
Tushar Prasanna Swaminathan, Christopher Silver, Thangarajah Akilan
TL;DR
This study benchmarks deep learning models on the NVIDIA Jetson Nano to assess real-time inference performance after hardware-specific optimization with TensorRT. It employs image classification and four-way human action recognition tasks, converting PyTorch models to TensorRT engines via an ONNX-based pipeline to evaluate pre- and post-optimization speeds. The results show substantial average speedups, with a FLOPS-dependent trend where lower-FLOPS models benefit more, though some custom architectures exhibit exceptions. The work emphasizes hardware-aware optimization for sustainable, scalable edge AI deployment and points to future work in quantization-aware training and pruning to push further efficiency gains.
Abstract
The proliferation of complex deep learning (DL) models has revolutionized various applications, including computer vision-based solutions, prompting their integration into real-time systems. However, the resource-intensive nature of these models poses challenges for deployment on low-computational power and low-memory devices, like embedded and edge devices. This work empirically investigates the optimization of such complex DL models to analyze their functionality on an embedded device, particularly on the NVIDIA Jetson Nano. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection. The experimental results reveal that, on average, optimized models exhibit a 16.11% speed improvement over their non-optimized counterparts. This not only emphasizes the critical need to consider hardware constraints and environmental sustainability in model development and deployment but also underscores the pivotal role of model optimization in enabling the widespread deployment of AI-assisted technologies on resource-constrained computational systems. It also serves as proof that prioritizing hardware-specific model optimization leads to efficient and scalable solutions that substantially decrease energy consumption and carbon footprint.
