NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance

Raphael T. Husistein; Markus Reiher; Marco Eckhoff

NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance

Raphael T. Husistein, Markus Reiher, Marco Eckhoff

TL;DR

NEAR introduces a training-free zero-cost proxy for neural architecture search by quantifying network expressivity through the effective rank of pre- and post-activation representations. It demonstrates strong correlations with final accuracy across NAS benchmarks (e.g., NAS-Bench-101 and NATS-Bench) and outperforms many existing proxies in average ranking, without relying on labels or training dynamics. Beyond architecture selection, NEAR enables estimating optimal layer sizes and guiding activation-function and weight-initialization choices, and it adapts to CNNs with a practical sampling strategy. Collectively, NEAR promises substantial reductions in NAS computational burden while maintaining robust performance predictions, with reproducible code and data publicly available.

Abstract

Artificial neural networks have been shown to be state-of-the-art machine learning models in a wide variety of applications, including natural language processing and image recognition. However, building a performant neural network is a laborious task and requires substantial computing power. Neural Architecture Search (NAS) addresses this issue by an automatic selection of the optimal network from a set of potential candidates. While many NAS methods still require training of (some) neural networks, zero-cost proxies promise to identify the optimal network without training. In this work, we propose the zero-cost proxy \textit{Network Expressivity by Activation Rank} (NEAR). It is based on the effective rank of the pre- and post-activation matrix, i.e., the values of a neural network layer before and after applying its activation function. We demonstrate the cutting-edge correlation between this network score and the model accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS. In addition, we present a simple approach to estimate the optimal layer sizes in multi-layer perceptrons. Furthermore, we show that this score can be utilized to select hyperparameters such as the activation function and the neural network weight initialization scheme.

NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance

TL;DR

Abstract

Paper Structure (20 sections, 6 equations, 9 figures, 13 tables)

This paper contains 20 sections, 6 equations, 9 figures, 13 tables.

Introduction
Related Work
Methods
Multi-Layer Perceptron
Network Expressivity by Activation Rank (NEAR)
NEAR for Convolutional Neural Networks
Results and Discussion
Correlation of Effective Rank and Final Model Accuracy
Estimation of Optimal Layer Size
Estimation of Activation Function and Weight Initialization Performance
Conclusion
Appendix
Supporting Figures
Additional NAS-Benchmarks
Proxies to Select Weight Initialization and Activation Function
...and 5 more sections

Figures (9)

Figure A.1: Illustration of an artificial neural network with three inputs $\{x_i\}$ and two outputs $\{y_i\}$. The single hidden layer consists of five neurons. The activation functions are shown within the neurons. The solid lines represent the weights, while the dashed lines indicate input and output without weights.
Figure A.2: A convolution is performed on the input of dimension $4\times 4 \times 3$ with a filter of dimension $2 \times 2 \times 3$ resulting in a feature map of dimension $3 \times 3$.
Figure A.3: Process of reshaping convolutional neural network feature maps. (a) The process begins with four $3\times3$ feature maps. (b) These feature maps are subsequently reshaped to a $9\times4$ matrix. (c) An activation matrix is given by four contiguous rows, whereby the first row contains elements extracted from the top row of the feature maps. From all possible activation matrices one is randomly selected.
Figure A.4: A power function fitted to the NEAR score divided by the total number of neurons in the layer. The star marks the first time where the slope is smaller or equal to $0.5\%$ of the slope at $x = 1$. The plot has been generated for the experiments on the lMLP.
Figure A.5: Spearman's $\rho$ correlation on NATS-Bench-SSS using the CIFAR-10 dataset, evaluated after $0$, $1$, $3$, $5$, and $10$ training epochs.
...and 4 more figures

Theorems & Definitions (2)

Definition 3.1: Effective Rank Roy2007
Definition 3.2: Network Expressivity by Activation Rank (NEAR)

NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance

TL;DR

Abstract

NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance

Authors

TL;DR

Abstract

Table of Contents

Figures (9)

Theorems & Definitions (2)