SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge Devices via Hardware-Aware Evolutionary Search
Hung-Yueh Chiang, Diana Marculescu
TL;DR
SCAN-Edge tackles the challenge of hardware-aware NAS across diverse edge devices by jointly searching for self-attention, convolution, and activation within a unified, MobileNet-speed framework. It combines a weight-sharing supernet with device-specific latency calibration via latency LUTs and an accuracy predictor to efficiently guide subnet selection under hardware constraints. The method yields hybrid networks that match MobileNetV2 latency while achieving higher accuracy across CPU, GPU, and USB accelerators, and extends to downstream tasks like transfer learning and object detection. This approach demonstrates the value of hardware- and compiler-aware search spaces and space-evolution strategies for practical, device-specific neural architecture design at the edge, with potential for broader deployment optimization.
Abstract
Designing low-latency and high-efficiency hybrid networks for a variety of low-cost commodity edge devices is both costly and tedious, leading to the adoption of hardware-aware neural architecture search (NAS) for finding optimal architectures. However, unifying NAS for a wide range of edge devices presents challenges due to the variety of hardware designs, supported operations, and compilation optimizations. Existing methods often fix the search space of architecture choices (e.g., activation, convolution, or self-attention) and estimate latency using hardware-agnostic proxies (e.g., FLOPs), which fail to achieve proclaimed latency across various edge devices. To address this issue, we propose SCAN-Edge, a unified NAS framework that jointly searches for self-attention, convolution, and activation to accommodate the wide variety of edge devices, including CPU-, GPU-, and hardware accelerator-based systems. To handle the large search space, SCAN-Edge relies on with a hardware-aware evolutionary algorithm that improves the quality of the search space to accelerate the sampling process. Experiments on large-scale datasets demonstrate that our hybrid networks match the actual MobileNetV2 latency for 224x224 input resolution on various commodity edge devices.
