Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression
Lu Zou, Liang Ding
TL;DR
The paper addresses the scalability challenge of training Additive Gaussian Processes with Bayesian Back-fitting by proving a fundamental convergence lower bound and introducing Kernel Multigrid (KMG). Leveraging Kernel Packets (KP) for efficient one-dimensional GP computations, it shows Back-fitting requires at least $O(n\log n)$ iterations, then demonstrates how KMG, combining Back-fitting with Sparse GPR on residuals, achieves $O(\log n)$ iterations while keeping per-iteration costs at $O(n\log n)$ time and $O(n)$ space. Theoretical guarantees hinge on a solid approximation property of sparse additive GPR and a smoothing analysis of the Back-fit operator coupled with coarse-grid corrections. Numerical experiments on synthetic and real data corroborate the theory, with KMG markedly accelerating convergence and accurately recovering per-dimension contributions using only a handful of inducing points. The work thus provides a practical pathway to scalable, interpretable, high-dimensional additive GP modeling.
Abstract
Additive Gaussian Processes (GPs) are popular approaches for nonparametric feature selection. The common training method for these models is Bayesian Back-fitting. However, the convergence rate of Back-fitting in training additive GPs is still an open problem. By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))^t$, where $n$ and $t$ denote the data size and the iteration number, respectively. Consequently, Back-fitting requires a minimum of $\mathcal{O}(n\log n)$ iterations to achieve convergence. Based on KPs, we further propose an algorithm called Kernel Multigrid (KMG). This algorithm enhances Back-fitting by incorporating a sparse Gaussian Process Regression (GPR) to process the residuals after each Back-fitting iteration. It is applicable to additive GPs with both structured and scattered data. Theoretically, we prove that KMG reduces the required iterations to $\mathcal{O}(\log n)$ while preserving the time and space complexities at $\mathcal{O}(n\log n)$ and $\mathcal{O}(n)$ per iteration, respectively. Numerically, by employing a sparse GPR with merely 10 inducing points, KMG can produce accurate approximations of high-dimensional targets within 5 iterations.
