Table of Contents
Fetching ...

Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators

Alaa Zniber, Arne Symons, Ouassim Karrakchou, Marian Verhelst, Mounir Ghogho

TL;DR

This work tackles deploying Early Exiting Neural Networks on resource-constrained edge accelerators by proposing a hardware-aware NAS that jointly optimizes exit placement, quantization, and hardware mapping using a Stream-based cost model. The method formulates a constrained multi-objective optimization and uses a GA with progressive predictors within a quantization-aware training loop to discover efficient architectures. Experiments on CIFAR-10 with a MobileNetV2 backbone on a quad-core Edge TPU demonstrate over 50% reduction in energy-delay product compared to static architectures, illustrating practical gains for edge deployment. The study analyzes quantization and exit mounting-point effects and shows how hardware-aware design yields Pareto-optimal trade-offs between accuracy and latency/energy.

Abstract

Advancements in high-performance computing and cloud technologies have enabled the development of increasingly sophisticated Deep Learning (DL) models. However, the growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints, challenging the deployment of these large-scale models. Early Exiting Neural Networks (EENN) have emerged as a promising solution, allowing dynamic termination of inference based on input complexity to enhance efficiency. Despite their potential, EENN performance is highly influenced by the heterogeneity of edge accelerators and the constraints imposed by quantization, affecting accuracy, energy efficiency, and latency. Yet, research on the automatic optimization of EENN design for edge hardware remains limited. To bridge this gap, we propose a hardware-aware Neural Architecture Search (NAS) framework that systematically integrates the effects of quantization and hardware resource allocation to optimize the placement of early exit points within a network backbone. Experimental results on the CIFAR-10 dataset demonstrate that our NAS framework can discover architectures that achieve over a 50\% reduction in computational costs compared to conventional static networks, making them more suitable for deployment in resource-constrained edge environments.

Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators

TL;DR

This work tackles deploying Early Exiting Neural Networks on resource-constrained edge accelerators by proposing a hardware-aware NAS that jointly optimizes exit placement, quantization, and hardware mapping using a Stream-based cost model. The method formulates a constrained multi-objective optimization and uses a GA with progressive predictors within a quantization-aware training loop to discover efficient architectures. Experiments on CIFAR-10 with a MobileNetV2 backbone on a quad-core Edge TPU demonstrate over 50% reduction in energy-delay product compared to static architectures, illustrating practical gains for edge deployment. The study analyzes quantization and exit mounting-point effects and shows how hardware-aware design yields Pareto-optimal trade-offs between accuracy and latency/energy.

Abstract

Advancements in high-performance computing and cloud technologies have enabled the development of increasingly sophisticated Deep Learning (DL) models. However, the growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints, challenging the deployment of these large-scale models. Early Exiting Neural Networks (EENN) have emerged as a promising solution, allowing dynamic termination of inference based on input complexity to enhance efficiency. Despite their potential, EENN performance is highly influenced by the heterogeneity of edge accelerators and the constraints imposed by quantization, affecting accuracy, energy efficiency, and latency. Yet, research on the automatic optimization of EENN design for edge hardware remains limited. To bridge this gap, we propose a hardware-aware Neural Architecture Search (NAS) framework that systematically integrates the effects of quantization and hardware resource allocation to optimize the placement of early exit points within a network backbone. Experimental results on the CIFAR-10 dataset demonstrate that our NAS framework can discover architectures that achieve over a 50\% reduction in computational costs compared to conventional static networks, making them more suitable for deployment in resource-constrained edge environments.

Paper Structure

This paper contains 19 sections, 9 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An example early exiting network with 3 backbone blocks and 3 exits.
  • Figure 2: The cumulative (left) and average (right) energy-delay product $ET$ at different stages for differently quantized models. The ET_avg is drastically reduced thanks to the early exiting at early stages.
  • Figure 3: Family of 4-exit models in INT8+8 configuration with identical backbone and exits topologies. Each model has different exit points, identified by the exit indices (c.f., Table \ref{['tab:mbv2']}).
  • Figure 4: The cumulative (left) and average (right) energy-delay product $ET$ at different stages for different mounting points with identical backbone and exit architecture. The $ET$ varies due to exit ratio differences and tensor dimension mismatches with the accelerator dataflows.
  • Figure 5: Overview of the hardware-aware NAS process with enhanced Stream stream framework for hardware performance estimation at different exit points.
  • ...and 3 more figures