Table of Contents
Fetching ...

Optimizing edge AI models on HPC systems with the edge in the loop

Marcel Aach, Cyril Blanc, Andreas Lintermann, Kurt De Grave

TL;DR

This work tackles deploying accurate, low-latency ML on edge devices by introducing a hardware-aware NAS workflow that couples a Belgian edge unit with a German HPC cluster. The HPC2Edge system trains candidate architectures on HPC hardware while measuring true edge latency, ensuring architectures are optimized for both accuracy and real-time inference on the target device. Empirical results on the RAISE-LPBF-Laser dataset show that the NAS-optimized models achieve approximately 8.8× faster inference and about 1.35× better final test loss than a human-designed baseline, demonstrating a practical path to faster, more reliable edge AI in additive manufacturing. The approach highlights the value of cross-border, latency-aware NAS for industrial edge deployments and points to further improvements with larger NAS budgets and smaller edge devices.

Abstract

Artificial intelligence and machine learning models deployed on edge devices, e.g., for quality control in Additive Manufacturing (AM), are frequently small in size. Such models usually have to deliver highly accurate results within a short time frame. Methods that are commonly employed in literature start out with larger trained models and try to reduce their memory and latency footprint by structural pruning, knowledge distillation, or quantization. It is, however, also possible to leverage hardware-aware Neural Architecture Search (NAS), an approach that seeks to systematically explore the architecture space to find optimized configurations. In this study, a hardware-aware NAS workflow is introduced that couples an edge device located in Belgium with a powerful High-Performance Computing system in Germany, to train possible architecture candidates as fast as possible while performing real-time latency measurements on the target hardware. The approach is verified on a use case in the AM domain, based on the open RAISE-LPBF dataset, achieving ~8.8 times faster inference speed while simultaneously enhancing model quality by a factor of ~1.35, compared to a human-designed baseline.

Optimizing edge AI models on HPC systems with the edge in the loop

TL;DR

This work tackles deploying accurate, low-latency ML on edge devices by introducing a hardware-aware NAS workflow that couples a Belgian edge unit with a German HPC cluster. The HPC2Edge system trains candidate architectures on HPC hardware while measuring true edge latency, ensuring architectures are optimized for both accuracy and real-time inference on the target device. Empirical results on the RAISE-LPBF-Laser dataset show that the NAS-optimized models achieve approximately 8.8× faster inference and about 1.35× better final test loss than a human-designed baseline, demonstrating a practical path to faster, more reliable edge AI in additive manufacturing. The approach highlights the value of cross-border, latency-aware NAS for industrial edge deployments and points to further improvements with larger NAS budgets and smaller edge devices.

Abstract

Artificial intelligence and machine learning models deployed on edge devices, e.g., for quality control in Additive Manufacturing (AM), are frequently small in size. Such models usually have to deliver highly accurate results within a short time frame. Methods that are commonly employed in literature start out with larger trained models and try to reduce their memory and latency footprint by structural pruning, knowledge distillation, or quantization. It is, however, also possible to leverage hardware-aware Neural Architecture Search (NAS), an approach that seeks to systematically explore the architecture space to find optimized configurations. In this study, a hardware-aware NAS workflow is introduced that couples an edge device located in Belgium with a powerful High-Performance Computing system in Germany, to train possible architecture candidates as fast as possible while performing real-time latency measurements on the target hardware. The approach is verified on a use case in the AM domain, based on the open RAISE-LPBF dataset, achieving ~8.8 times faster inference speed while simultaneously enhancing model quality by a factor of ~1.35, compared to a human-designed baseline.

Paper Structure

This paper contains 11 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The edge device, an Nvidia AGX Orin (front), with a frame grabber PCIe card (green) for interfacing with high-speed cameras over fiber.
  • Figure 2: Relational database schema for connecting the HPC-based HPO with an embedded device for inference measurements.
  • Figure 3: Orchestration of the Hardware-aware NAS search, with communication between the HPC system, located at the Jülich Supercomputing Centre, Forschungszentrum Jülich, in Germany, and the edge device, located at Flanders Make in Belgium.
  • Figure 4: Results at different scales, showing the median best run.