The boundary of neural network trainability is fractal

Jascha Sohl-Dickstein

The boundary of neural network trainability is fractal

Jascha Sohl-Dickstein

TL;DR

The paper investigates whether the boundary between trainable and untrainable neural network hyperparameters exhibits fractal structure, by treating neural network training as an iterative map $f(W; \eta) = W - \eta\,g(W)$. The authors conduct systematic full-batch and minibatch experiments on a one-hidden-layer network with mean-field parameterization, grid-searching over hyperparameters and visualizing the resulting trainability landscapes; fractal dimensions are estimated via box-counting. Across six experimental conditions (tanh, ReLU, identity, minibatch, single datapoint, and alternative initialization schedule), the boundary consistently displays fractal behavior, with estimated fractal dimensions ranging roughly from $1.17$ to $1.98$. These findings suggest that meta-loss landscapes and hyperparameter sensitivity in neural network training may inherit fractal properties, offering a new lens for meta-learning and hyperparameter optimization near the edge of stability.

Abstract

Some fractals -- for instance those associated with the Mandelbrot and quadratic Julia sets -- are computed by iterating a function, and identifying the boundary between hyperparameters for which the resulting series diverges or remains bounded. Neural network training similarly involves iterating an update function (e.g. repeated steps of gradient descent), can result in convergent or divergent behavior, and can be extremely sensitive to small changes in hyperparameters. Motivated by these similarities, we experimentally examine the boundary between neural network hyperparameters that lead to stable and divergent training. We find that this boundary is fractal over more than ten decades of scale in all tested configurations.

The boundary of neural network trainability is fractal

TL;DR

The paper investigates whether the boundary between trainable and untrainable neural network hyperparameters exhibits fractal structure, by treating neural network training as an iterative map

. The authors conduct systematic full-batch and minibatch experiments on a one-hidden-layer network with mean-field parameterization, grid-searching over hyperparameters and visualizing the resulting trainability landscapes; fractal dimensions are estimated via box-counting. Across six experimental conditions (tanh, ReLU, identity, minibatch, single datapoint, and alternative initialization schedule), the boundary consistently displays fractal behavior, with estimated fractal dimensions ranging roughly from

. These findings suggest that meta-loss landscapes and hyperparameter sensitivity in neural network training may inherit fractal properties, offering a new lens for meta-learning and hyperparameter optimization near the edge of stability.

Abstract

Paper Structure (13 sections, 1 equation, 1 figure)

This paper contains 13 sections, 1 equation, 1 figure.

Introduction
Experiments
Network and data
Training
Visualization and analysis
Experimental conditions
Discussion
Elaborate functions in high dimensional spaces
Non-homogeneity of boundary
Stochastic training
Higher dimensional fractals
Meta-loss landscapes are difficult to navigate
Fractals are beautiful and relaxing

Figures (1)

Figure 1: The boundary between trainable and untrainable neural network hyperparameters is fractal, for all experimental conditions. Images show a 2d grid search over neural network hyperparameters. For points shaded red, training diverged. For points shaded blue, training converged. Paler points correspond to faster convergence or divergence. Experimental conditions include different network nonlinearities, both minibatch and full batch training, and grid searching over either training or initialization hyperparameters. See Section \ref{['sec:exp cond']} for details. Each image is a hyperlink to an animation zooming into the corresponding fractal landscape (to the depth at which float64 discretization artifacts appear). Experimental code, images, and videos are available at https://github.com/Sohl-Dickstein/fractal.

The boundary of neural network trainability is fractal

TL;DR

Abstract

The boundary of neural network trainability is fractal

Authors

TL;DR

Abstract

Table of Contents

Figures (1)