Table of Contents
Fetching ...

Variation Matters: from Mitigating to Embracing Zero-Shot NAS Ranking Function Variation

Pavel Rumiantsev, Mark Coates

TL;DR

The paper tackles variability in zero-shot NAS ranking functions caused by randomness in weight initialization and data batches. It proposes a framework that treats ranking outputs as random variables and uses stochastic dominance, via the Mann-Whitney U-test, to compare architectures instead of simple averages. A ranking function variation metric Var_{SS}(r,B,V) is defined, and the approach is evaluated with random and evolutionary searches across NAS-Bench-101, NAS-Bench-201, and TransNAS-Bench-101, showing improved search performance for many ranking functions. The findings emphasize that ranking function stability matters and that modeling NAS as a stochastic optimization can guide the design of robust zero-shot NAS pipelines; data-agnostic ranking functions may require alternative strategies.

Abstract

Neural Architecture Search (NAS) is a powerful automatic alternative to manual design of a neural network. In the zero-shot version, a fast ranking function is used to compare architectures without training them. The outputs of the ranking functions often vary significantly due to different sources of randomness, including the evaluated architecture's weights' initialization or the batch of data used for calculations. A common approach to addressing the variation is to average a ranking function output over several evaluations. We propose taking into account the variation in a different manner, by viewing the ranking function output as a random variable representing a proxy performance metric. During the search process, we strive to construct a stochastic ordering of the performance metrics to determine the best architecture. Our experiments show that the proposed stochastic ordering can effectively boost performance of a search on standard benchmark search spaces.

Variation Matters: from Mitigating to Embracing Zero-Shot NAS Ranking Function Variation

TL;DR

The paper tackles variability in zero-shot NAS ranking functions caused by randomness in weight initialization and data batches. It proposes a framework that treats ranking outputs as random variables and uses stochastic dominance, via the Mann-Whitney U-test, to compare architectures instead of simple averages. A ranking function variation metric Var_{SS}(r,B,V) is defined, and the approach is evaluated with random and evolutionary searches across NAS-Bench-101, NAS-Bench-201, and TransNAS-Bench-101, showing improved search performance for many ranking functions. The findings emphasize that ranking function stability matters and that modeling NAS as a stochastic optimization can guide the design of robust zero-shot NAS pipelines; data-agnostic ranking functions may require alternative strategies.

Abstract

Neural Architecture Search (NAS) is a powerful automatic alternative to manual design of a neural network. In the zero-shot version, a fast ranking function is used to compare architectures without training them. The outputs of the ranking functions often vary significantly due to different sources of randomness, including the evaluated architecture's weights' initialization or the batch of data used for calculations. A common approach to addressing the variation is to average a ranking function output over several evaluations. We propose taking into account the variation in a different manner, by viewing the ranking function output as a random variable representing a proxy performance metric. During the search process, we strive to construct a stochastic ordering of the performance metrics to determine the best architecture. Our experiments show that the proposed stochastic ordering can effectively boost performance of a search on standard benchmark search spaces.

Paper Structure

This paper contains 21 sections, 2 equations, 6 figures, 7 tables, 4 algorithms.

Figures (6)

  • Figure 1: Ranking functions variation $Var_{SS}(r,B,V)$ for ranking function $r$, batch size $B$ = 64, $V$=10 and search space $SS$) on (\ref{['fig:cv:nasbench']}) NAS-Bench and (\ref{['fig:cv:transnas']}) TransNAS-Bench-101 demonstrate that different ranking functions have considerably different variations within the search space, but a specific ranking function's variation remains similar for the different search spaces.
  • Figure 2: Box plot (versus accuracy) of coefficients of variation for individual architectures ($CV(\mathcal{M}_i(B,V))=\frac{Var(\mathcal{M}_i(B,V)}{Mean(\mathcal{M}_i(B,V)}$ where $\mathcal{M}_i(B,V) = \{r(arch_i, d_v)\}_{v=1}^V$ is a set of $V$=10 ranking function outcomes for architecture $arch_i$ derived from random batches of data $d_k$, each of size $B = 64$. The depicted ranking functions are ReLU Hamming distance on NAS-Bench-101 (CIFAR-10) (\ref{['fig:acc_vs_cv:nasbench101']}) and Eigenvalue Score on NAS-Bench-201 (ImageNet16-120) (\ref{['fig:acc_vs_cv:nasbench201']}). For visualization purposes, we exclude ten percent of the architectures-those that exhibit lowest accuracy.
  • Figure 3: Kendall-$\tau$ correlation coefficient between coefficient of variation values and validation accuracy, for each ranking function and search space. EPE-NAS and LGA require classification labels and therefore cannot be computed for TransNAS-Bench-101 (Autoencoder) and TransNAS-Bench-101 (Normal).
  • Figure 4: Kendall-$\tau$ correlation coefficient between a ranking function coefficent of variation values and ranking function mean value taken over 10 evaluations, for each ranking function and search space. EPE-NAS and LGA require classification labels and therefore can not be computed for TransNAS-Bench-101 (Autoencoder) and TransNAS-Bench-101 (Normal).
  • Figure 5: Significance level ablation of the p-value threshold. The Eigenvalue score is used with FreeREA search algorithm. The accuracy is presented with respect to the accuracy of 0.05 threshold (see Table \ref{['tab:evosearch_results']}).
  • ...and 1 more figures