Table of Contents
Fetching ...

Sparse Orthogonal Variational Inference for Gaussian Processes

Jiaxin Shi, Michalis K. Titsias, Andriy Mnih

TL;DR

This work addresses the scalability-gap in Gaussian processes by reinterpreting sparse variational GP (SVGP) inference as a two-component orthogonal decomposition of the GP prior. By introducing an additional orthogonal inducing-point set, SOLVE-GP provides a structured variational bound that yields tighter marginal likelihood lower bounds without prohibitive cost, effectively allowing more inducing points under a fixed budget. The framework subsumes SVGP as a special case, connects to decoupled inducing-point methods, and extends naturally to inter-domain, convolutional, and deep GP models, achieving state-of-the-art results on CIFAR-10 with purely GP-based models. Overall, SOLVE-GP enhances the expressiveness and scalability of GP posteriors, enabling powerful large-scale and deep GP architectures for real-world applications.

Abstract

We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points, which can lead to more scalable algorithms than previous methods. It is based on decomposing a Gaussian process as a sum of two independent processes: one spanned by a finite basis of inducing points and the other capturing the remaining variation. We show that this formulation recovers existing approximations and at the same time allows to obtain tighter lower bounds on the marginal likelihood and new stochastic variational inference algorithms. We demonstrate the efficiency of these algorithms in several Gaussian process models ranging from standard regression to multi-class classification using (deep) convolutional Gaussian processes and report state-of-the-art results on CIFAR-10 among purely GP-based models.

Sparse Orthogonal Variational Inference for Gaussian Processes

TL;DR

This work addresses the scalability-gap in Gaussian processes by reinterpreting sparse variational GP (SVGP) inference as a two-component orthogonal decomposition of the GP prior. By introducing an additional orthogonal inducing-point set, SOLVE-GP provides a structured variational bound that yields tighter marginal likelihood lower bounds without prohibitive cost, effectively allowing more inducing points under a fixed budget. The framework subsumes SVGP as a special case, connects to decoupled inducing-point methods, and extends naturally to inter-domain, convolutional, and deep GP models, achieving state-of-the-art results on CIFAR-10 with purely GP-based models. Overall, SOLVE-GP enhances the expressiveness and scalability of GP posteriors, enabling powerful large-scale and deep GP architectures for real-world applications.

Abstract

We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points, which can lead to more scalable algorithms than previous methods. It is based on decomposing a Gaussian process as a sum of two independent processes: one spanned by a finite basis of inducing points and the other capturing the remaining variation. We show that this formulation recovers existing approximations and at the same time allows to obtain tighter lower bounds on the marginal likelihood and new stochastic variational inference algorithms. We demonstrate the efficiency of these algorithms in several Gaussian process models ranging from standard regression to multi-class classification using (deep) convolutional Gaussian processes and report state-of-the-art results on CIFAR-10 among purely GP-based models.

Paper Structure

This paper contains 38 sections, 47 equations, 4 figures, 10 tables, 1 algorithm.

Figures (4)

  • Figure 1: The graphical model of SOLVE-GP. The prior $f\sim \mathcal{GP}(0, k)$ is decomposed into two independent GPs (denoted by thick horizontal lines): $f_\|\sim p_\|$ and $f_\perp\sim p_\perp$. The variables connected by thick lines form a multivariate Gaussian. $\textbf{X}, \textbf{y}$ denote the training data. $\textbf{X}^*$ are the test inputs. $\mathbf{f}_\| = f_\|(\textbf{X})$, $\mathbf{f}_\perp = f_\perp(\textbf{X})$. $\textbf{u} = f_\|(\textbf{Z})$ denote the inducing variables in standard SVGP methods. SOLVE-GP introduces another set of inducing variables $\textbf{v}_\perp=f_\perp(\textbf{O})$ to summarize $p_\perp$.
  • Figure 2: Posterior processes on the Snelson dataset, where shaded bands correspond to intervals of $\pm3$ standard deviations. The learned inducing locations are shown at the bottom of each figure, where $+$ correspond to $\textbf{Z}$; blue and dark triangles correspond to $\textbf{O}$ in ODVGP and SOLVE-GP, respectively.
  • Figure 3: Test RMSE and predictive log-likelihoods during training on HouseElectric.
  • Figure 4: Comparison of computational cost for SVGP and SOLVE-GP. For each method and each type of cubic-cost operation, we plot the factor of increase in cost compared to a single operation on $M\times M$ matrices.