Cascade of phase transitions in the training of Energy-based models
Dimitrios Bachtis, Giulio Biroli, Aurélien Decelle, Beatriz Seoane
TL;DR
This work analyzes how energy-based models, specifically RBMs, learn data distributions by revealing a cascade of second-order phase transitions in the weight-spectrum as training progresses. Using a tractable BG-RBM analytic framework and real-data validations (HGD, MNIST, CelebA), the authors show that learning proceeds through successive refinements of principal data modes, with weight directions aligning to PCA components and later encoding finer structures. A mean-field finite-size scaling hypothesis is proposed and tested, indicating universal critical behavior (e.g., $\gamma=1$) at the first transition, and they observe divergent MCMC mixing times and hysteresis at the transitions. The results offer a mechanistic view of feature encoding in generative models and have practical implications for training and sampling efficiency, with potential extensions to deeper EBMs and diffusion-like models.
Abstract
In this paper, we investigate the feature encoding process in a prototypical energy-based generative model, the Restricted Boltzmann Machine (RBM). We start with an analytical investigation using simplified architectures and data structures, and end with numerical analysis of real trainings on real datasets. Our study tracks the evolution of the model's weight matrix through its singular value decomposition, revealing a series of phase transitions associated to a progressive learning of the principal modes of the empirical probability distribution. The model first learns the center of mass of the modes and then progressively resolve all modes through a cascade of phase transitions. We first describe this process analytically in a controlled setup that allows us to study analytically the training dynamics. We then validate our theoretical results by training the Bernoulli-Bernoulli RBM on real data sets. By using data sets of increasing dimension, we show that learning indeed leads to sharp phase transitions in the high-dimensional limit. Moreover, we propose and test a mean-field finite-size scaling hypothesis. This shows that the first phase transition is in the same universality class of the one we studied analytically, and which is reminiscent of the mean-field paramagnetic-to-ferromagnetic phase transition.
