Mean-Field Assisted Deep Boltzmann Learning with Probabilistic Computers
Shuvro Chowdhury, Shaila Niazi, Kerem Y. Camsari
TL;DR
The paper tackles the tractability challenge of training deep, unrestricted Boltzmann machines by leveraging a fast FPGA-based p-computer to perform negative-phase sampling while introducing two mean-field theory variants (Naive and Hierarchical) to efficiently estimate positive-phase statistics. The authors propose a hybrid learning framework that uses xMFTs for the positive phase and hardware-assisted Gibbs sampling for the negative phase, enabling CD with very large n on large, sparse Ising networks. Empirically, they demonstrate training a 2-layer DBM on a Pegasus graph with 2560 p-bits to nearly MNIST-level performance (≈87% accuracy) and show that HMFT improves positive-phase correlation estimates over naive MFT, with manageable trade-offs in accuracy. This work suggests that, with dedicated probabilistic hardware and HMFT-assisted training, deep and unrestricted Boltzmann machines can be trained at scales previously deemed intractable, potentially generalizing to other Ising-machine platforms.
Abstract
Despite their appeal as physics-inspired, energy-based and generative nature, general Boltzmann Machines (BM) are considered intractable to train. This belief led to simplified models of BMs with restricted intralayer connections or layer-by-layer training of deep BMs. Recent developments in domain-specific hardware -- specifically probabilistic computers (p-computer) with probabilistic bits (p-bit) -- may change established wisdom on the tractability of deep BMs. In this paper, we show that deep and unrestricted BMs can be trained using p-computers generating hundreds of billions of Markov Chain Monte Carlo (MCMC) samples per second, on sparse networks developed originally for use in D-Wave's annealers. To maximize the efficiency of learning the p-computer, we introduce two families of Mean-Field Theory assisted learning algorithms, or xMFTs (x = Naive and Hierarchical). The xMFTs are used to estimate the averages and correlations during the positive phase of the contrastive divergence (CD) algorithm and our custom-designed p-computer is used to estimate the averages and correlations in the negative phase. A custom Field-Programmable-Gate Array (FPGA) emulation of the p-computer architecture takes up to 45 billion flips per second, allowing the implementation of CD-$n$ where $n$ can be of the order of millions, unlike RBMs where $n$ is typically 1 or 2. Experiments on the full MNIST dataset with the combined algorithm show that the positive phase can be efficiently computed by xMFTs without much degradation when the negative phase is computed by the p-computer. Our algorithm can be used in other scalable Ising machines and its variants can be used to train BMs, previously thought to be intractable.
