Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

Guihong Li; Duc Hoang; Kartikeya Bhardwaj; Ming Lin; Zhangyang Wang; Radu Marculescu

Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

Guihong Li, Duc Hoang, Kartikeya Bhardwaj, Ming Lin, Zhangyang Wang, Radu Marculescu

TL;DR

This survey tackles zero-shot Neural Architecture Search by categorizing and evaluating training-free proxies that predict network accuracy without training. It contrasts gradient-based and gradient-free proxies, connects them to expressivity, generalization, and trainability, and benchmarks their performance across standard NAS tasks, large-scale datasets, and Vision Transformers, including hardware-aware scenarios. The findings show that simple proxies like #Params and #FLOPs often outperform more sophisticated proxies in unconstrained settings, while all proxies struggle under hardware constraints, signaling a need for better benchmarks and tailored proxies. The work highlights practical implications for edge-AI deployment and sets a roadmap for developing proxies and benchmarks that better reflect real-world hardware and task demands.

Abstract

Recently, zero-shot (or training-free) Neural Architecture Search (NAS) approaches have been proposed to liberate NAS from the expensive training process. The key idea behind zero-shot NAS approaches is to design proxies that can predict the accuracy of some given networks without training the network parameters. The proxies proposed so far are usually inspired by recent progress in theoretical understanding of deep learning and have shown great potential on several datasets and NAS benchmarks. This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches, with an emphasis on their hardware awareness. To this end, we first review the mainstream zero-shot proxies and discuss their theoretical underpinnings. We then compare these zero-shot proxies through large-scale experiments and demonstrate their effectiveness in both hardware-aware and hardware-oblivious NAS scenarios. Finally, we point out several promising ideas to design better proxies. Our source code and the list of related papers are available on https://github.com/SLDGroup/survey-zero-shot-nas.

Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

TL;DR

Abstract

Paper Structure (35 sections, 20 equations, 19 figures, 6 tables)

This paper contains 35 sections, 20 equations, 19 figures, 6 tables.

Introduction
Zero-Shot Proxies
Theoretical Underpinning of Proxies
Gradient-based accuracy proxies
Gradient norm
SNIP
Synflow
GraSP
GradSign
Fisher information
Jacobian covariant
Zen-score
NTK Condition Number
Gradient-free accuracy proxy
Number of linear regions
...and 20 more sections

Figures (19)

Figure 1: Overview of existing NAS approaches. NAS is designed to search for optimal architectures with both good accuracy and high efficiency on real hardware. (Data collected from paper_with_code_nas)
Figure 2: Illustration of differentiable neural architecture search. (1). Merge all candidate operations into a hyper-network with learnable weights for each operation. (2). Train the hyper-network and update the learnable weights for each operation. (3) Generate the final results by selecting the operations with the highest weight values (boldest edges). (Adapted from Darts)
Figure 3: Illustration of weight-sharing mechanism. The parameters of relatively simple operations are obtained from complex operations, i.e., super kernel. As shown, different operations share the parameters from the super kernel. (Adapted from single_path_nas)
Figure 4: The illustration of Logdet proxy; $A_i, B_i, i=\{1,2,3\}$ are the neurons of a multi-layer perceptron. First, the input space is divided into several linear regions. Next, each region is encoded by a binary code; then Eq. \ref{['eq:Logdet']} is applied to compute the Logdet proxy. (Adapted from tfnas11logdet)
Figure 5: Search space of NASBench-201. Each architecture in the search space is built by stacking a cell multiple times; each cell can have six operations (edges in the figure) and each operation has 5 potential different options (drawn with different colors). NASBench-101 has a very similar search space with more candidate operations. (Adapted from Darts)
...and 14 more figures

Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

TL;DR

Abstract

Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

Authors

TL;DR

Abstract

Table of Contents

Figures (19)