Enhancing Low-Precision Sampling via Stochastic Gradient Hamiltonian Monte Carlo
Ziyi Wang, Yujie Chen, Qifan Song, Ruqi Zhang
TL;DR
The paper addresses the challenge of efficient Bayesian sampling under low-precision arithmetic by proposing and analyzing low-precision SGHMC. It develops three variants—SGHMCLP-F, SGHMCLP-L, and VC SGHMCLP-L—with rigorous non-asymptotic bounds in 2-Wasserstein distance, demonstrating faster convergence and robustness to quantization than SGLD, especially for non-log-concave targets. The key contributions include a complete theoretical treatment of low-precision SGHMC across strongly log-concave and non-log-concave regimes, the introduction of variance-corrected quantization to mitigate overdispersion, and extensive experiments on Gaussian distributions, MNIST, and CIFAR datasets showing practical gains for resource-constrained settings. The results suggest that low-precision SGHMC is a viable, efficient approach for sampling in large-scale Bayesian deep learning, offering both speed and uncertainty quantification benefits in low-precision environments.
Abstract
Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy. Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy. This paper investigates low-precision sampling via Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and full-precision gradient accumulators for both strongly log-concave and non-log-concave distributions. Theoretically, our results show that, to achieve $ε$-error in the 2-Wasserstein distance for non-log-concave distributions, low-precision SGHMC achieves quadratic improvement ($\widetilde{\mathbf{O}}\left({ε^{-2}{μ^*}^{-2}\log^2\left({ε^{-1}}\right)}\right)$) compared to the state-of-the-art low-precision sampler, Stochastic Gradient Langevin Dynamics (SGLD) ($\widetilde{\mathbf{O}}\left({ε^{-4}{λ^{*}}^{-1}\log^5\left({ε^{-1}}\right)}\right)$). Moreover, we prove that low-precision SGHMC is more robust to the quantization error compared to low-precision SGLD due to the robustness of the momentum-based update w.r.t. gradient noise. Empirically, we conduct experiments on synthetic data, and {MNIST, CIFAR-10 \& CIFAR-100} datasets, which validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited machine learning.
