Learning to Explore for Stochastic Gradient MCMC
SeungHyun Kim, Seohyeon Jung, Seonghyeon Kim, Juho Lee
TL;DR
This work targets Bayesian neural networks where posterior inference is hindered by high dimensionality and multimodality. It introduces Learning to Explore (L2E), a meta-learning SGMCMC framework that learns gradients of the kinetic energy via neural nets $\alpha_\phi$ and $\beta_\phi$, while keeping the diffusion and curl components simple and eliminating the costly $\Gamma(z)$ term. The meta-objective is based on the predictive distribution, optimized with unbiased gradient estimators, and the method is trained over a diverse, multitask task distribution to promote transfer to unseen datasets and architectures. Empirically, L2E achieves faster mixing, better predictive accuracy, improved multimodality capture, and robustness under distribution shifts on image benchmarks, with modest computational overhead compared to standard SG-MCMC baselines.
Abstract
Bayesian Neural Networks(BNNs) with high-dimensional parameters pose a challenge for posterior inference due to the multi-modality of the posterior distributions. Stochastic Gradient MCMC(SGMCMC) with cyclical learning rate scheduling is a promising solution, but it requires a large number of sampling steps to explore high-dimensional multi-modal posteriors, making it computationally expensive. In this paper, we propose a meta-learning strategy to build \gls{sgmcmc} which can efficiently explore the multi-modal target distributions. Our algorithm allows the learned SGMCMC to quickly explore the high-density region of the posterior landscape. Also, we show that this exploration property is transferrable to various tasks, even for the ones unseen during a meta-training stage. Using popular image classification benchmarks and a variety of downstream tasks, we demonstrate that our method significantly improves the sampling efficiency, achieving better performance than vanilla \gls{sgmcmc} without incurring significant computational overhead.
