Table of Contents
Fetching ...

Predicting Neural Network Accuracy from Weights

Thomas Unterthiner, Daniel Keysers, Sylvain Gelly, Olivier Bousquet, Ilya Tolstikhin

TL;DR

This work formalizes the goal of predicting a neural network's accuracy using only its trained weights, and builds a large publicly available dataset of 120k CNNs (Small CNN Zoo) to study weight-accuracy mappings. It shows that weight-based predictors, particularly gradient-boosted trees using per-layer weight statistics, can rank models by performance with R^2 often exceeding 0.98 and can transfer ranking across unseen datasets and larger architectures. The findings provide insights into training dynamics and offer a potential pathway for data-free model selection or early-stopping, while highlighting open questions about interpretable rules and stronger inductive biases. The authors also release the dataset to spur further research into weight-aware understanding of deep learning performance and generalization across domains.

Abstract

We show experimentally that the accuracy of a trained neural network can be predicted surprisingly well by looking only at its weights, without evaluating it on input data. We motivate this task and introduce a formal setting for it. Even when using simple statistics of the weights, the predictors are able to rank neural networks by their performance with very high accuracy (R2 score more than 0.98). Furthermore, the predictors are able to rank networks trained on different, unobserved datasets and with different architectures. We release a collection of 120k convolutional neural networks trained on four different datasets to encourage further research in this area, with the goal of understanding network training and performance better.

Predicting Neural Network Accuracy from Weights

TL;DR

This work formalizes the goal of predicting a neural network's accuracy using only its trained weights, and builds a large publicly available dataset of 120k CNNs (Small CNN Zoo) to study weight-accuracy mappings. It shows that weight-based predictors, particularly gradient-boosted trees using per-layer weight statistics, can rank models by performance with R^2 often exceeding 0.98 and can transfer ranking across unseen datasets and larger architectures. The findings provide insights into training dynamics and offer a potential pathway for data-free model selection or early-stopping, while highlighting open questions about interpretable rules and stronger inductive biases. The authors also release the dataset to spur further research into weight-aware understanding of deep learning performance and generalization across domains.

Abstract

We show experimentally that the accuracy of a trained neural network can be predicted surprisingly well by looking only at its weights, without evaluating it on input data. We motivate this task and introduce a formal setting for it. Even when using simple statistics of the weights, the predictors are able to rank neural networks by their performance with very high accuracy (R2 score more than 0.98). Furthermore, the predictors are able to rank networks trained on different, unobserved datasets and with different architectures. We release a collection of 120k convolutional neural networks trained on four different datasets to encourage further research in this area, with the goal of understanding network training and performance better.

Paper Structure

This paper contains 25 sections, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Diagram of the learning setting. Nodes contain hyperparameters $\lambda$, CNN weights $W$, and expected accuracy $\mathrm{Acc}\,_{\mathbb{P}}(W)$. Edges are labeled with the information necessary for the mapping: the training dataset $S_N$ and the data-generating distribution $\mathbb{P}$.
  • Figure 2: Distribution of the networks from the Small CNN Zoo collection over their test accuracy (first row) and their training/test accuracies (second row).
  • Figure 3: Scatter plots of the networks trained on CIFAR10-GS, colored by test accuracy (best viewed in color). Bias range width ($\mathrm{max}-\mathrm{min}$) in first layer (x-axis) and last layer (y-axis) together with the upper-right corners zoomed in. Networks trained with Adam/RMSProp (left) and SGD (right).
  • Figure 4: Distribution of true/predicted test accuracies for networks from the SVHN-GS (left) and CIFAR10-GS (right) collections together with Kendall's $\tau$ coefficient. Predictions were made with the GBM models trained on CIFAR10-GS (left) and MNIST (right) collections using $\widetilde{W}_{\!L}$.
  • Figure 5: Light-GBM feature importance values based on number of times the feature appeared in the trees. Four plots correspond to GBM predictors trained on four CNN collections using entire weight vectors $W$ as inputs. "L" in feature names refer to the layer, "W" to the (filter) weights, "B" to the biases. For example, "L4-B7" is the 7th bias parameter of the final dense layer and "L1-W123" is the 123rd filter weight parameter of the first convolutional layer.
  • ...and 1 more figures