Table of Contents
Fetching ...

Training-free Neural Architecture Search through Variance of Knowledge of Deep Network Weights

Ondřej Týbl, Lukáš Neumann

TL;DR

This paper tackles the high computational cost of Neural Architecture Search by proposing a training-free proxy, VKDNW, grounded in Fisher Information theory to estimate a deep network’s trainability without training. VKDNW quantifies weight-estimation difficulty via the spectrum of the empirical Fisher Information Matrix and uses entropy over the spectrum’s deciles, with a simple additive or aggregated ranking variant to compare architectures. The authors introduce a new evaluation metric, nDCG, to better assess a proxy’s ability to identify top-performing networks, and they demonstrate state-of-the-art results on NAS-Bench-201 and MobileNetV2 search spaces, including robustness to random inputs. The approach yields a zero-cost, scalable NAS method that complements existing proxies and provides strong theoretical grounding, with public code enabling reproducibility.

Abstract

Deep learning has revolutionized computer vision, but it achieved its tremendous success using deep network architectures which are mostly hand-crafted and therefore likely suboptimal. Neural Architecture Search (NAS) aims to bridge this gap by following a well-defined optimization paradigm which systematically looks for the best architecture, given objective criterion such as maximal classification accuracy. The main limitation of NAS is however its astronomical computational cost, as it typically requires training each candidate network architecture from scratch. In this paper, we aim to alleviate this limitation by proposing a novel training-free proxy for image classification accuracy based on Fisher Information. The proposed proxy has a strong theoretical background in statistics and it allows estimating expected image classification accuracy of a given deep network without training the network, thus significantly reducing computational cost of standard NAS algorithms. Our training-free proxy achieves state-of-the-art results on three public datasets and in two search spaces, both when evaluated using previously proposed metrics, as well as using a new metric that we propose which we demonstrate is more informative for practical NAS applications. The source code is publicly available at http://www.github.com/ondratybl/VKDNW

Training-free Neural Architecture Search through Variance of Knowledge of Deep Network Weights

TL;DR

This paper tackles the high computational cost of Neural Architecture Search by proposing a training-free proxy, VKDNW, grounded in Fisher Information theory to estimate a deep network’s trainability without training. VKDNW quantifies weight-estimation difficulty via the spectrum of the empirical Fisher Information Matrix and uses entropy over the spectrum’s deciles, with a simple additive or aggregated ranking variant to compare architectures. The authors introduce a new evaluation metric, nDCG, to better assess a proxy’s ability to identify top-performing networks, and they demonstrate state-of-the-art results on NAS-Bench-201 and MobileNetV2 search spaces, including robustness to random inputs. The approach yields a zero-cost, scalable NAS method that complements existing proxies and provides strong theoretical grounding, with public code enabling reproducibility.

Abstract

Deep learning has revolutionized computer vision, but it achieved its tremendous success using deep network architectures which are mostly hand-crafted and therefore likely suboptimal. Neural Architecture Search (NAS) aims to bridge this gap by following a well-defined optimization paradigm which systematically looks for the best architecture, given objective criterion such as maximal classification accuracy. The main limitation of NAS is however its astronomical computational cost, as it typically requires training each candidate network architecture from scratch. In this paper, we aim to alleviate this limitation by proposing a novel training-free proxy for image classification accuracy based on Fisher Information. The proposed proxy has a strong theoretical background in statistics and it allows estimating expected image classification accuracy of a given deep network without training the network, thus significantly reducing computational cost of standard NAS algorithms. Our training-free proxy achieves state-of-the-art results on three public datasets and in two search spaces, both when evaluated using previously proposed metrics, as well as using a new metric that we propose which we demonstrate is more informative for practical NAS applications. The source code is publicly available at http://www.github.com/ondratybl/VKDNW

Paper Structure

This paper contains 36 sections, 29 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Training-free NAS methods on ImageNet16-120 dong2020bench. Methods are compared by Normalized Discounted Cumulative Gain (see Sec. \ref{['sec:evaluationmetrics']}), our method (VKDNW) is the best also measured by Kendall's $\tau$ and Spearman's $\rho$ correlations (see Table \ref{['tab:table1']}). Also note that simple number of trainable layers (below denoted $\aleph$) is significantly better trivial proxy than the number of FLOPs.
  • Figure 2: Toy example of two rankings on 10 networks. We plot accuracies ordered by the rankings and evaluation metrics Kendall's $\tau$ (KT) and Spearman's $\rho$ (SPR) correlations and Normalized Discounted Cumulative Gain ($\text{nDCG}_{5}$).
  • Figure 3: Components of AZ-NAS lee2024az and our VKDNW are compared w.r.t. correlation with $\aleph$ (number of trainable layers), in the NAS-Bench-201 search space dong2020bench on ImageNet16-120 chrabaszcz2017downsampled dataset. Our VKDNW proxy has the lowest correlation, ie. is the most invariant to the size of the model.
  • Figure 4: Components of AZ-NAS lee2024az and our VKDNW are compared w.r.t. correlation with number of model parameters, in the NAS-Bench-201 search space dong2020bench on ImageNet16-120 chrabaszcz2017downsampled dataset. Our VKDNW proxy has the lowest correlation, ie. is the most invariant to the size of the model.