Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators
Alaa Zniber, Arne Symons, Ouassim Karrakchou, Marian Verhelst, Mounir Ghogho
TL;DR
This work tackles deploying Early Exiting Neural Networks on resource-constrained edge accelerators by proposing a hardware-aware NAS that jointly optimizes exit placement, quantization, and hardware mapping using a Stream-based cost model. The method formulates a constrained multi-objective optimization and uses a GA with progressive predictors within a quantization-aware training loop to discover efficient architectures. Experiments on CIFAR-10 with a MobileNetV2 backbone on a quad-core Edge TPU demonstrate over 50% reduction in energy-delay product compared to static architectures, illustrating practical gains for edge deployment. The study analyzes quantization and exit mounting-point effects and shows how hardware-aware design yields Pareto-optimal trade-offs between accuracy and latency/energy.
Abstract
Advancements in high-performance computing and cloud technologies have enabled the development of increasingly sophisticated Deep Learning (DL) models. However, the growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints, challenging the deployment of these large-scale models. Early Exiting Neural Networks (EENN) have emerged as a promising solution, allowing dynamic termination of inference based on input complexity to enhance efficiency. Despite their potential, EENN performance is highly influenced by the heterogeneity of edge accelerators and the constraints imposed by quantization, affecting accuracy, energy efficiency, and latency. Yet, research on the automatic optimization of EENN design for edge hardware remains limited. To bridge this gap, we propose a hardware-aware Neural Architecture Search (NAS) framework that systematically integrates the effects of quantization and hardware resource allocation to optimize the placement of early exit points within a network backbone. Experimental results on the CIFAR-10 dataset demonstrate that our NAS framework can discover architectures that achieve over a 50\% reduction in computational costs compared to conventional static networks, making them more suitable for deployment in resource-constrained edge environments.
