Table of Contents
Fetching ...

ParZC: Parametric Zero-Cost Proxies for Efficient NAS

Peijie Dong, Lujun Li, Xinglin Pan, Zimian Wei, Xiang Liu, Qiang Wang, Xiaowen Chu

TL;DR

ParZC tackles the instability and non-adaptivity of zero-shot NAS proxies by introducing a parametric framework that models node-wise contributions and their uncertainties. It combines a Mixer Architecture with Bayesian Network (MABN) to learn how node statistics interact and to quantify uncertainty, with a differentiable ranking loss, DiffKendall, to align architecture rankings via Kendall's Tau $\tau$. Across NAS-Bench-101/201 and NDS, ParZC achieves superior rank correlation and drastically reduces search time, and extends to Vision Transformer search spaces with strong performance. This work advances training-free NAS by providing a scalable, adaptable proxy design that can transfer across architectures.

Abstract

Recent advancements in Zero-shot Neural Architecture Search (NAS) highlight the efficacy of zero-cost proxies in various NAS benchmarks. Several studies propose the automated design of zero-cost proxies to achieve SOTA performance but require tedious searching progress. Furthermore, we identify a critical issue with current zero-cost proxies: they aggregate node-wise zero-cost statistics without considering the fact that not all nodes in a neural network equally impact performance estimation. Our observations reveal that node-wise zero-cost statistics significantly vary in their contributions to performance, with each node exhibiting a degree of uncertainty. Based on this insight, we introduce a novel method called Parametric Zero-Cost Proxies (ParZC) framework to enhance the adaptability of zero-cost proxies through parameterization. To address the node indiscrimination, we propose a Mixer Architecture with Bayesian Network (MABN) to explore the node-wise zero-cost statistics and estimate node-specific uncertainty. Moreover, we propose DiffKendall as a loss function to directly optimize Kendall's Tau coefficient in a differentiable manner so that our ParZC can better handle the discrepancies in ranking architectures. Comprehensive experiments on NAS-Bench-101, 201, and NDS demonstrate the superiority of our proposed ParZC compared to existing zero-shot NAS methods. Additionally, we demonstrate the versatility and adaptability of ParZC by transferring it to the Vision Transformer search space.

ParZC: Parametric Zero-Cost Proxies for Efficient NAS

TL;DR

ParZC tackles the instability and non-adaptivity of zero-shot NAS proxies by introducing a parametric framework that models node-wise contributions and their uncertainties. It combines a Mixer Architecture with Bayesian Network (MABN) to learn how node statistics interact and to quantify uncertainty, with a differentiable ranking loss, DiffKendall, to align architecture rankings via Kendall's Tau . Across NAS-Bench-101/201 and NDS, ParZC achieves superior rank correlation and drastically reduces search time, and extends to Vision Transformer search spaces with strong performance. This work advances training-free NAS by providing a scalable, adaptable proxy design that can transfer across architectures.

Abstract

Recent advancements in Zero-shot Neural Architecture Search (NAS) highlight the efficacy of zero-cost proxies in various NAS benchmarks. Several studies propose the automated design of zero-cost proxies to achieve SOTA performance but require tedious searching progress. Furthermore, we identify a critical issue with current zero-cost proxies: they aggregate node-wise zero-cost statistics without considering the fact that not all nodes in a neural network equally impact performance estimation. Our observations reveal that node-wise zero-cost statistics significantly vary in their contributions to performance, with each node exhibiting a degree of uncertainty. Based on this insight, we introduce a novel method called Parametric Zero-Cost Proxies (ParZC) framework to enhance the adaptability of zero-cost proxies through parameterization. To address the node indiscrimination, we propose a Mixer Architecture with Bayesian Network (MABN) to explore the node-wise zero-cost statistics and estimate node-specific uncertainty. Moreover, we propose DiffKendall as a loss function to directly optimize Kendall's Tau coefficient in a differentiable manner so that our ParZC can better handle the discrepancies in ranking architectures. Comprehensive experiments on NAS-Bench-101, 201, and NDS demonstrate the superiority of our proposed ParZC compared to existing zero-shot NAS methods. Additionally, we demonstrate the versatility and adaptability of ParZC by transferring it to the Vision Transformer search space.
Paper Structure (25 sections, 4 equations, 9 figures, 11 tables, 1 algorithm)

This paper contains 25 sections, 4 equations, 9 figures, 11 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of ParZC and EZNAS akhauri2022eznas pipeline. W: Weight, A: Activation, G: Gradient, H: Hessian Matrix.
  • Figure 2: Node-wise relative importance of ZC proxies (Synflow tanaka2020pruning_synflow, GradNorm abdelfattah2021zerocost, and Fisher Turner2019BlockSwapFB_fisher) based on GBDT impurity on NAS-Bench-201.
  • Figure 3: The framework of ParZC. Left: Illustration of node-wise ZC proxies. Different ZC may extract gradient (G), weight (W), hessian (H), or activation (A) from different nodes. ParZC utilizes these node-wise ZC from different proxies as input. Right: mixer architecture with Bayesian network. We propose a Bayesian network and mixer architecture to build the ParZC to measure the uncertainty and enhance inter-channel information extraction. We propose DiffKendall as a loss function to better monitor the relative relation of different architectures.
  • Figure 4: Correlation of all architectures
  • Figure 5: Correlation of top architectures
  • ...and 4 more figures