Table of Contents
Fetching ...

SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge Devices via Hardware-Aware Evolutionary Search

Hung-Yueh Chiang, Diana Marculescu

TL;DR

SCAN-Edge tackles the challenge of hardware-aware NAS across diverse edge devices by jointly searching for self-attention, convolution, and activation within a unified, MobileNet-speed framework. It combines a weight-sharing supernet with device-specific latency calibration via latency LUTs and an accuracy predictor to efficiently guide subnet selection under hardware constraints. The method yields hybrid networks that match MobileNetV2 latency while achieving higher accuracy across CPU, GPU, and USB accelerators, and extends to downstream tasks like transfer learning and object detection. This approach demonstrates the value of hardware- and compiler-aware search spaces and space-evolution strategies for practical, device-specific neural architecture design at the edge, with potential for broader deployment optimization.

Abstract

Designing low-latency and high-efficiency hybrid networks for a variety of low-cost commodity edge devices is both costly and tedious, leading to the adoption of hardware-aware neural architecture search (NAS) for finding optimal architectures. However, unifying NAS for a wide range of edge devices presents challenges due to the variety of hardware designs, supported operations, and compilation optimizations. Existing methods often fix the search space of architecture choices (e.g., activation, convolution, or self-attention) and estimate latency using hardware-agnostic proxies (e.g., FLOPs), which fail to achieve proclaimed latency across various edge devices. To address this issue, we propose SCAN-Edge, a unified NAS framework that jointly searches for self-attention, convolution, and activation to accommodate the wide variety of edge devices, including CPU-, GPU-, and hardware accelerator-based systems. To handle the large search space, SCAN-Edge relies on with a hardware-aware evolutionary algorithm that improves the quality of the search space to accelerate the sampling process. Experiments on large-scale datasets demonstrate that our hybrid networks match the actual MobileNetV2 latency for 224x224 input resolution on various commodity edge devices.

SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge Devices via Hardware-Aware Evolutionary Search

TL;DR

SCAN-Edge tackles the challenge of hardware-aware NAS across diverse edge devices by jointly searching for self-attention, convolution, and activation within a unified, MobileNet-speed framework. It combines a weight-sharing supernet with device-specific latency calibration via latency LUTs and an accuracy predictor to efficiently guide subnet selection under hardware constraints. The method yields hybrid networks that match MobileNetV2 latency while achieving higher accuracy across CPU, GPU, and USB accelerators, and extends to downstream tasks like transfer learning and object detection. This approach demonstrates the value of hardware- and compiler-aware search spaces and space-evolution strategies for practical, device-specific neural architecture design at the edge, with potential for broader deployment optimization.

Abstract

Designing low-latency and high-efficiency hybrid networks for a variety of low-cost commodity edge devices is both costly and tedious, leading to the adoption of hardware-aware neural architecture search (NAS) for finding optimal architectures. However, unifying NAS for a wide range of edge devices presents challenges due to the variety of hardware designs, supported operations, and compilation optimizations. Existing methods often fix the search space of architecture choices (e.g., activation, convolution, or self-attention) and estimate latency using hardware-agnostic proxies (e.g., FLOPs), which fail to achieve proclaimed latency across various edge devices. To address this issue, we propose SCAN-Edge, a unified NAS framework that jointly searches for self-attention, convolution, and activation to accommodate the wide variety of edge devices, including CPU-, GPU-, and hardware accelerator-based systems. To handle the large search space, SCAN-Edge relies on with a hardware-aware evolutionary algorithm that improves the quality of the search space to accelerate the sampling process. Experiments on large-scale datasets demonstrate that our hybrid networks match the actual MobileNetV2 latency for 224x224 input resolution on various commodity edge devices.
Paper Structure (39 sections, 2 theorems, 11 equations, 9 figures, 10 tables, 2 algorithms)

This paper contains 39 sections, 2 theorems, 11 equations, 9 figures, 10 tables, 2 algorithms.

Key Result

Theorem 1

Given $\lambda \in [0, 1]$, the weighted mean of two probability distributions $\Psi$ and $\Phi$ that are defined in the same sample space $\Omega$ such that $\Theta = \lambda\Psi + (1-\lambda)\Phi$ is a probability distribution defined in $\Omega$.

Figures (9)

  • Figure 1: We profile the latency and zero-cost proxies of EfficientFormerV2 S0 on different devices. (a) shows the latency of the first stage (FFNs only) with input size (h, w, c)=(56, 56, 32). While unified FFN has fewer FLOPS, it is bounded by memory and has a similar latency to fused FFN. (b) shows the latency of the last stage (FFNs and MHSAs) with input size (7, 7, 176). The stage latency is device-dependent and highly different from the proxies.
  • Figure 2: We use three components in our supernet with dynamic activation layers. Each MHSA block is followed by either a unified FFN or a fused FFN. The components, e.g., residual connections, are simplified to avoid cluttering the figure.
  • Figure 3: Naive latency estimations from block-wised latency lookup tables (LUTs) tend to be overestimated (blue). We additionally profile 10 end-to-end subnet latencies to calibrate the LUTs by linear regression. We show the high quality of the calibrated latency estimations that fit $y=x$ closely (green).
  • Figure 4: (a) The subnet sampling time during the search will increase exponentially if we reduce the latency constraint from $100$ ms to $30$ ms. The sampling is performed on Nano with TensorRT. (b) Illustration of search space evolution. Dots represent sampled subnets from the space. After a few interactions, the search space evolves from blue to red dots to where it meets the constraints: $5$ M parameters and $20$ ms latency, i.e., red dots are mostly inside the red dashed rectangle.
  • Figure 5: Our models achieve MobileNet speed among all hybrid models counterparts across platforms while outperforming MobileNet in accuracy. We also pivot Once-for-all (OfA, Conv only) in grey crosses on the figure for reference.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • proof