Stochastic Gradient Descent for Gaussian Processes Done Right
Jihao Andreas Lin, Shreyas Padhy, Javier Antorán, Austin Tripp, Alexander Terenin, Csaba Szepesvári, José Miguel Hernández-Lobato, David Janz
TL;DR
The paper addresses the computational bottleneck of Gaussian process regression by solving the linear system $(K+\lambda I)^{-1}b$ efficiently with stochastic dual descent (SDD), a dual-gradient SGD method. SDD uses a dual objective, multiplicative-noise gradient estimates via random coordinates, Nesterov momentum, and geometric iterate averaging to accelerate convergence for mean estimation and posterior sampling. Empirically, SDD matches or surpasses preconditioned CG and variational GP methods on UCI benchmarks and Bayesian optimization, and achieves competitive performance with state-of-the-art graph neural networks on large molecular docking predictions. This work demonstrates that a carefully designed first-order stochastic method can render Gaussian processes competitive with modern deep learning approaches in large-scale, uncertainty-aware tasks, broadening their practical applicability.
Abstract
As is well known, both sampling from the posterior and computing the mean of the posterior in Gaussian process regression reduces to solving a large linear system of equations. We study the use of stochastic gradient descent for solving this linear system, and show that when \emph{done right} -- by which we mean using specific insights from the optimisation and kernel communities -- stochastic gradient descent is highly effective. To that end, we introduce a particularly simple \emph{stochastic dual descent} algorithm, explain its design in an intuitive manner and illustrate the design choices through a series of ablation studies. Further experiments demonstrate that our new method is highly competitive. In particular, our evaluations on the UCI regression tasks and on Bayesian optimisation set our approach apart from preconditioned conjugate gradients and variational Gaussian process approximations. Moreover, our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
