The boundary of neural network trainability is fractal
Jascha Sohl-Dickstein
TL;DR
The paper investigates whether the boundary between trainable and untrainable neural network hyperparameters exhibits fractal structure, by treating neural network training as an iterative map $f(W; \eta) = W - \eta\,g(W)$. The authors conduct systematic full-batch and minibatch experiments on a one-hidden-layer network with mean-field parameterization, grid-searching over hyperparameters and visualizing the resulting trainability landscapes; fractal dimensions are estimated via box-counting. Across six experimental conditions (tanh, ReLU, identity, minibatch, single datapoint, and alternative initialization schedule), the boundary consistently displays fractal behavior, with estimated fractal dimensions ranging roughly from $1.17$ to $1.98$. These findings suggest that meta-loss landscapes and hyperparameter sensitivity in neural network training may inherit fractal properties, offering a new lens for meta-learning and hyperparameter optimization near the edge of stability.
Abstract
Some fractals -- for instance those associated with the Mandelbrot and quadratic Julia sets -- are computed by iterating a function, and identifying the boundary between hyperparameters for which the resulting series diverges or remains bounded. Neural network training similarly involves iterating an update function (e.g. repeated steps of gradient descent), can result in convergent or divergent behavior, and can be extremely sensitive to small changes in hyperparameters. Motivated by these similarities, we experimentally examine the boundary between neural network hyperparameters that lead to stable and divergent training. We find that this boundary is fractal over more than ten decades of scale in all tested configurations.
