Table of Contents
Fetching ...

Estimating the Probability of Sampling a Trained Neural Network at Random

Adam Scherlis, Nora Belrose

TL;DR

The paper tackles understanding generalization through the geometry of neural-network parameter space by introducing a fast estimator of local volume around an anchor, measured under a Gaussian or uniform prior. It defines two neighborhood types—Loss and KL—and develops preconditioning and Gaussian-volume variants to produce accurate, scalable estimates, showing that local volume tends to shrink as training proceeds and is smaller for poisoned, poorly generalizing networks. The results broadly support the volume hypothesis, linking architectural inductive bias, description length, and optimizer dynamics to generalization, while offering a practical metric for model interpretability and complexity. The work outlines future directions, including stochastic-geometry refinements via SGLD and applications to detecting hidden behavior or backdoors in models.

Abstract

We present and analyze an algorithm for estimating the size, under a Gaussian or uniform measure, of a localized neighborhood in neural network parameter space with behavior similar to an ``anchor'' point. We refer to this as the "local volume" of the anchor. We adapt an existing basin-volume estimator, which is very fast but in many cases only provides a lower bound. We show that this lower bound can be improved with an importance-sampling method using gradient information that is already provided by popular optimizers. The negative logarithm of local volume can also be interpreted as a measure of the anchor network's information content. As expected for a measure of complexity, this quantity increases during language model training. We find that overfit, badly-generalizing neighborhoods are smaller, indicating a more complex learned behavior. This smaller volume can also be interpreted in an MDL sense as suboptimal compression. Our results are consistent with a picture of generalization we call the "volume hypothesis": that neural net training produces good generalization primarily because the architecture gives simple functions more volume in parameter space, and the optimizer samples from the low-loss manifold in a volume-sensitive way. We believe that fast local-volume estimators are a promising practical metric of network complexity and architectural inductive bias for interpretability purposes.

Estimating the Probability of Sampling a Trained Neural Network at Random

TL;DR

The paper tackles understanding generalization through the geometry of neural-network parameter space by introducing a fast estimator of local volume around an anchor, measured under a Gaussian or uniform prior. It defines two neighborhood types—Loss and KL—and develops preconditioning and Gaussian-volume variants to produce accurate, scalable estimates, showing that local volume tends to shrink as training proceeds and is smaller for poisoned, poorly generalizing networks. The results broadly support the volume hypothesis, linking architectural inductive bias, description length, and optimizer dynamics to generalization, while offering a practical metric for model interpretability and complexity. The work outlines future directions, including stochastic-geometry refinements via SGLD and applications to detecting hidden behavior or backdoors in models.

Abstract

We present and analyze an algorithm for estimating the size, under a Gaussian or uniform measure, of a localized neighborhood in neural network parameter space with behavior similar to an ``anchor'' point. We refer to this as the "local volume" of the anchor. We adapt an existing basin-volume estimator, which is very fast but in many cases only provides a lower bound. We show that this lower bound can be improved with an importance-sampling method using gradient information that is already provided by popular optimizers. The negative logarithm of local volume can also be interpreted as a measure of the anchor network's information content. As expected for a measure of complexity, this quantity increases during language model training. We find that overfit, badly-generalizing neighborhoods are smaller, indicating a more complex learned behavior. This smaller volume can also be interpreted in an MDL sense as suboptimal compression. Our results are consistent with a picture of generalization we call the "volume hypothesis": that neural net training produces good generalization primarily because the architecture gives simple functions more volume in parameter space, and the optimizer samples from the low-loss manifold in a volume-sensitive way. We believe that fast local-volume estimators are a promising practical metric of network complexity and architectural inductive bias for interpretability purposes.

Paper Structure

This paper contains 31 sections, 25 equations, 9 figures.

Figures (9)

  • Figure 1: Results ($k=3000$) for various preconditioners on a small MLP. Vertical dashed lines indicate the aggregated log-volume estimate, which is very close to the maximum sample.
  • Figure 2: Results ($k=1000$) with and without Adam preconditioner on Pythia 31M
  • Figure 3: Results ($k=1000$) with and without Adam preconditioner on ConvNeXt Atto poisoned and unpoisoned
  • Figure 4: Local volume decrease while training Pythia 31M
  • Figure 5: Local volume decrease while training ConvNeXt V2 Atto, and training metrics across datasets
  • ...and 4 more figures