Diluting Restricted Boltzmann Machines
C. Díaz-Faloh, R. Mulet
TL;DR
The paper addresses the cost and scalability concerns of large neural networks by testing the Lottery Ticket Hypothesis on Restricted Boltzmann Machines under extreme sparsity. It trains RBMs on MNIST, applies heavy pruning before and after training, and evaluates generative quality with multiple metrics, including a novel auxiliary classifier score and generalized Ising-model mappings. Key findings show RBMs can sustain high-quality generation with up to $80\%$ initial pruning, but additional pruning causes abrupt degradation, and retraining cannot fully overcome the initial learning trajectory, highlighting the importance of early pruning and initialization. These results have practical implications for designing efficient sparse architectures and emphasize the enduring influence of initial conditions on network capabilities, with potential applicability beyond RBMs to broader sparse learning regimes.
Abstract
Recent advances in artificial intelligence have relied heavily on increasingly large neural networks, raising concerns about their computational and environmental costs. This paper investigates whether simpler, sparser networks can maintain strong performance by studying Restricted Boltzmann Machines (RBMs) under extreme pruning conditions. Inspired by the Lottery Ticket Hypothesis, we demonstrate that RBMs can achieve high-quality generative performance even when up to 80% of the connections are pruned before training, confirming that they contain viable sub-networks. However, our experiments reveal crucial limitations: trained networks cannot fully recover lost performance through retraining once additional pruning is applied. We identify a sharp transition above which the generative quality degrades abruptly when pruning disrupts a minimal core of essential connections. Moreover, re-trained networks remain constrained by the parameters originally learned performing worse than networks trained from scratch at equivalent sparsity levels. These results suggest that for sparse networks to work effectively, pruning should be implemented early in training rather than attempted afterwards. Our findings provide practical insights for the development of efficient neural architectures and highlight the persistent influence of initial conditions on network capabilities.
