MONAS: Efficient Zero-Shot Neural Architecture Search for MCUs
Ye Qiao, Haocheng Xu, Yifan Zhang, Sitao Huang
TL;DR
MONAS tackles the challenge of designing efficient CNNs for resource-constrained MCUs by uniting hardware-aware, zero-shot NAS with a novel MCU latency estimator. It combines trainability and expressivity proxies (NTK spectrum and LR count) with hardware indicators (FLOPs and latency) and employs a pruning-based search to rapidly identify architectures that minimize latency while preserving accuracy. Empirical results on NAS-Bench-201 across multiple datasets and MCU boards show up to 1104x improvements in search efficiency and up to 3.23x faster MCU inference compared with general or MCU-agnostic NAS methods, with robust generalizability to different MCU platforms. The work offers practical impact for edge AI deployment, enabling efficient, hardware-aware neural architecture search without heavy training, and provides open-source avenues for broader adoption and further improvements such as memory usage modeling.
Abstract
Neural Architecture Search (NAS) has proven effective in discovering new Convolutional Neural Network (CNN) architectures, particularly for scenarios with well-defined accuracy optimization goals. However, previous approaches often involve time-consuming training on super networks or intensive architecture sampling and evaluations. Although various zero-cost proxies correlated with CNN model accuracy have been proposed for efficient architecture search without training, their lack of hardware consideration makes it challenging to target highly resource-constrained edge devices such as microcontroller units (MCUs). To address these challenges, we introduce MONAS, a novel hardware-aware zero-shot NAS framework specifically designed for MCUs in edge computing. MONAS incorporates hardware optimality considerations into the search process through our proposed MCU hardware latency estimation model. By combining this with specialized performance indicators (proxies), MONAS identifies optimal neural architectures without incurring heavy training and evaluation costs, optimizing for both hardware latency and accuracy under resource constraints. MONAS achieves up to a 1104x improvement in search efficiency over previous work targeting MCUs and can discover CNN models with over 3.23x faster inference on MCUs while maintaining similar accuracy compared to more general NAS approaches.
