The effect of priors on Learning with Restricted Boltzmann Machines
Gianluca Manzan, Daniele Tantari
TL;DR
This work analyzes learning in Restricted Boltzmann Machines under a teacher–student framework with unit priors that interpolate between Gaussian and binary distributions. Using replica-based, RS free-energy analysis, it derives a full phase diagram and identifies a triple point that fixes the minimal dataset size $\alpha_c$ needed for learning by generalization, with the data properties of the teacher driving this bound. The study shows that Gaussian priors on hidden units aid entering the signal retrieval phase, while other priors can induce memorization-like retrieval or spin-glass behavior, especially under mismatched settings. Together with Monte Carlo simulations, the results offer practical guidance on architectural choices to maximize generalization under limited data and highlight potential extensions to structured data regimes.
Abstract
Restricted Boltzmann Machines (RBMs) are generative models designed to learn from data with a rich underlying structure. In this work, we explore a teacher-student setting where a student RBM learns from examples generated by a teacher RBM, with a focus on the effect of the unit priors on learning efficiency. We consider a parametric class of priors that interpolate between continuous (Gaussian) and binary variables. This approach models various possible choices of visible units, hidden units, and weights for both the teacher and student RBMs. By analyzing the phase diagram of the posterior distribution in both the Bayes optimal and mismatched regimes, we demonstrate the existence of a triple point that defines the critical dataset size necessary for learning through generalization. The critical size is strongly influenced by the properties of the teacher, and thus the data, but is unaffected by the properties of the student RBM. Nevertheless, a prudent choice of student priors can facilitate training by expanding the so-called signal retrieval region, where the machine generalizes effectively.
