Table of Contents
Fetching ...

Improving Iterative Gaussian Processes via Warm Starting Sequential Posteriors

Alan Yufei Dong, Jihao Andreas Lin, José Miguel Hernández-Lobato

TL;DR

This work tackles the scalability bottleneck of Gaussian process inference in sequential settings by warm-starting iterative linear solvers when new data are added. The authors prove that initializing from the previous solution reduces the initial distance to the final solution in the extended system, yielding substantial speed-ups across conjugate gradients, SGD, and alternating projections, and enabling more accurate posterior samples under compute constraints. Empirically, warm starting accelerates GP regression solves by up to roughly 6x for some solvers and improves Bayesian optimization performance in parallel Thompson sampling under limited compute budgets. Overall, the method enhances GP scalability for online learning and sequential decision-making without extra computation beyond storing prior solutions, with practical impact on active learning, online GP updates, and BO.

Abstract

Scalable Gaussian process (GP) inference is essential for sequential decision-making tasks, yet improving GP scalability remains a challenging problem with many open avenues of research. This paper focuses on iterative GPs, where iterative linear solvers, such as conjugate gradients, stochastic gradient descent or alternative projections, are used to approximate the GP posterior. We propose a new method which improves solver convergence of a large linear system by leveraging the known solution to a smaller system contained within. This is significant for tasks with incremental data additions, and we show that our technique achieves speed-ups when solving to tolerance, as well as improved Bayesian optimisation performance under a fixed compute budget.

Improving Iterative Gaussian Processes via Warm Starting Sequential Posteriors

TL;DR

This work tackles the scalability bottleneck of Gaussian process inference in sequential settings by warm-starting iterative linear solvers when new data are added. The authors prove that initializing from the previous solution reduces the initial distance to the final solution in the extended system, yielding substantial speed-ups across conjugate gradients, SGD, and alternating projections, and enabling more accurate posterior samples under compute constraints. Empirically, warm starting accelerates GP regression solves by up to roughly 6x for some solvers and improves Bayesian optimization performance in parallel Thompson sampling under limited compute budgets. Overall, the method enhances GP scalability for online learning and sequential decision-making without extra computation beyond storing prior solutions, with practical impact on active learning, online GP updates, and BO.

Abstract

Scalable Gaussian process (GP) inference is essential for sequential decision-making tasks, yet improving GP scalability remains a challenging problem with many open avenues of research. This paper focuses on iterative GPs, where iterative linear solvers, such as conjugate gradients, stochastic gradient descent or alternative projections, are used to approximate the GP posterior. We propose a new method which improves solver convergence of a large linear system by leveraging the known solution to a smaller system contained within. This is significant for tasks with incremental data additions, and we show that our technique achieves speed-ups when solving to tolerance, as well as improved Bayesian optimisation performance under a fixed compute budget.

Paper Structure

This paper contains 25 sections, 22 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The initial distance with the warm start initialisation is shown as a percentage of the distance with the cold start initialisation, with mean and standard deviation. Warm starting consistently reduces the initial distance by approximately 70% in both the posterior mean and sample systems and across all datasets, for the ratio of 1000 initial data points + 100 new data points.
  • Figure 2: The solver iterations required for convergence for warm starting are shown as a percentage of the iterations required for cold starting, with mean and standard deviation. We observe that the smaller initial distance has resulted in fewer iterations required by all solvers. This is true across all datasets and both linear systems. On average, the reduction in solver iterations is approximately 38% for CG, 40% for SGD and 83% for AP. The reduction is particularly significant for AP -- up to 98% for some datasets.
  • Figure 3: Maximum objective function values and final residuals of the posterior mean solve, as a function of data point acquisitions. Warm starting using previous weights achieves improved Bayesian optimisation performance, reflected in the smaller final residuals. The evolution of the residuals shows that linear solver progress accumulates over multiple solves with warm starting, rather than resetting after every solve.
  • Figure 4: Parallel Thompson experiment - Small Compute Budget
  • Figure 5: Parallel Thompson experiment - Large Compute Budget