Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted Boltzmann machines
Alberto Fachechi, Elena Agliari, Miriam Aquaro, Anthony Coolen, Menno Mulder
TL;DR
This work develops a statistical-mechanics framework for a binary-visible, Gaussian-hidden Restricted Boltzmann Machine trained on noisy realizations of a single ground pattern, using the replica trick under replica symmetry to derive self-consistent order-parameter equations. It identifies critical hyperparameter regimes governed by the regularization strength β_ε, the training temperature T_2, and dataset entropy ρ_0 (via ρ_0 ≥ 0), revealing a retrieval-dominated RS phase and a subregion where replica-symmetry breaking (RSB) is expected and numerically evident. Numerical experiments corroborate RS predictions in the generative regime, while showing aging, FDT violations, and multi-cluster structures in generated samples when RS fails, indicating rich non-equilibrium glassy dynamics. The results provide a principled, interpretable map between hyperparameters and operating regimes, offering guidance for hyperparameter tuning to achieve stable sampling and highlighting directions for extending the theory beyond RS to capture high-temperature, high-load, or more complex datasets. These insights contribute to a theoretically grounded understanding of when RBMs behave as reliable generators and how their dynamics relate to underlying spin-glass-like landscapes.
Abstract
We consider restricted Boltzmann machines with a binary visible layer and a Gaussian hidden layer trained by an unlabelled dataset composed of noisy realizations of a single ground pattern. We develop a statistical mechanics framework to describe the network generative capabilities, by exploiting the replica trick and assuming self-averaging of the underlying order parameters (i.e., replica symmetry). In particular, we outline the effective control parameters (e.g., the relative number of weights to be trained, the regularization parameter), whose tuning can yield qualitatively-different operative regimes. Further, we provide analytical and numerical evidence for the existence of a sub-region in the space of the hyperparameters where replica-symmetry breaking occurs.
