Phase transitions reveal hierarchical structure in deep neural networks
Authors
Ibrahim Talha Ersoy, Andrés Fernando Cardozo Licha, Karoline Wiesner
Abstract
Training Deep Neural Networks relies on the model converging on a high-dimensional, non-convex loss landscape toward a good minimum. Yet, much of the phenomenology of training remains ill understood. We focus on three seemingly disparate observations: the occurrence of phase transitions reminiscent of statistical physics, the ubiquity of saddle points, and phenomenon of mode connectivity relevant for model merging. We unify these within a single explanatory framework, the geometry of the loss and error landscapes. We analytically show that phase transitions in DNN learning are governed by saddle points in the loss landscape. Building on this insight, we introduce a simple, fast, and easy to implement algorithm that uses the L2 regularizer as a tool to probe the geometry of error landscapes. We apply it to confirm mode connectivity in DNNs trained on the MNIST dataset by efficiently finding paths that connect global minima. We then show numerically that saddle points induce transitions between models that encode distinct digit classes. Our work establishes the geometric origin of key training phenomena in DNNs and reveals a hierarchy of accuracy basins analogous to phases in statistical physics.