Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

Lu Zou; Liang Ding

Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

Lu Zou, Liang Ding

TL;DR

The paper addresses the scalability challenge of training Additive Gaussian Processes with Bayesian Back-fitting by proving a fundamental convergence lower bound and introducing Kernel Multigrid (KMG). Leveraging Kernel Packets (KP) for efficient one-dimensional GP computations, it shows Back-fitting requires at least $O(n\log n)$ iterations, then demonstrates how KMG, combining Back-fitting with Sparse GPR on residuals, achieves $O(\log n)$ iterations while keeping per-iteration costs at $O(n\log n)$ time and $O(n)$ space. Theoretical guarantees hinge on a solid approximation property of sparse additive GPR and a smoothing analysis of the Back-fit operator coupled with coarse-grid corrections. Numerical experiments on synthetic and real data corroborate the theory, with KMG markedly accelerating convergence and accurately recovering per-dimension contributions using only a handful of inducing points. The work thus provides a practical pathway to scalable, interpretable, high-dimensional additive GP modeling.

Abstract

Additive Gaussian Processes (GPs) are popular approaches for nonparametric feature selection. The common training method for these models is Bayesian Back-fitting. However, the convergence rate of Back-fitting in training additive GPs is still an open problem. By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))^t$, where $n$ and $t$ denote the data size and the iteration number, respectively. Consequently, Back-fitting requires a minimum of $\mathcal{O}(n\log n)$ iterations to achieve convergence. Based on KPs, we further propose an algorithm called Kernel Multigrid (KMG). This algorithm enhances Back-fitting by incorporating a sparse Gaussian Process Regression (GPR) to process the residuals after each Back-fitting iteration. It is applicable to additive GPs with both structured and scattered data. Theoretically, we prove that KMG reduces the required iterations to $\mathcal{O}(\log n)$ while preserving the time and space complexities at $\mathcal{O}(n\log n)$ and $\mathcal{O}(n)$ per iteration, respectively. Numerically, by employing a sparse GPR with merely 10 inducing points, KMG can produce accurate approximations of high-dimensional targets within 5 iterations.

Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

TL;DR

iterations, then demonstrates how KMG, combining Back-fitting with Sparse GPR on residuals, achieves

iterations while keeping per-iteration costs at

time and

space. Theoretical guarantees hinge on a solid approximation property of sparse additive GPR and a smoothing analysis of the Back-fit operator coupled with coarse-grid corrections. Numerical experiments on synthetic and real data corroborate the theory, with KMG markedly accelerating convergence and accurately recovering per-dimension contributions using only a handful of inducing points. The work thus provides a practical pathway to scalable, interpretable, high-dimensional additive GP modeling.

Abstract

, where

and

denote the data size and the iteration number, respectively. Consequently, Back-fitting requires a minimum of

iterations to achieve convergence. Based on KPs, we further propose an algorithm called Kernel Multigrid (KMG). This algorithm enhances Back-fitting by incorporating a sparse Gaussian Process Regression (GPR) to process the residuals after each Back-fitting iteration. It is applicable to additive GPs with both structured and scattered data. Theoretically, we prove that KMG reduces the required iterations to

while preserving the time and space complexities at

and

per iteration, respectively. Numerically, by employing a sparse GPR with merely 10 inducing points, KMG can produce accurate approximations of high-dimensional targets within 5 iterations.

Paper Structure (28 sections, 14 theorems, 146 equations, 6 figures, 2 tables, 3 algorithms)

This paper contains 28 sections, 14 theorems, 146 equations, 6 figures, 2 tables, 3 algorithms.

Introduction
Literature Review
Preliminaries
General Gaussian Processes
Additive GPs
Kernel Packet
Bayesian Back-fitting
Back-fitting Convergence Lower Bound
Proof Overview
Global Feature for Lg Kernels
Global Feature for general Matérn Kernels
A Two-dimensional Counterexample
Kernel Multigrid for Back-fitting
Sparse Gaussian Process Regression
Kernel Multigird
...and 13 more sections

Key Result

Theorem 1

Let $\boldsymbol{X}$ be a LHD and $\boldsymbol{Y}$ be generated by additive GP with kernel $k=\sum_{d=1}^Dk_d$ where each $k_d$ satisfies Assumption assump:kernel. Let $\boldsymbol{u}$ be the outputs by Algorithm alg:bayes_backfit with input $(\boldsymbol{X},\boldsymbol{Y})$ and iteration number $t$

Figures (6)

Figure 1: Left: the addition of five Matérn-${3}/{2}$ kernels $a_j k(\cdot,x_j)$ (colored lines, without compact supports) leads to a KP (black line, with a compact support); Right: converting 10 Matérn-${3}/{2}$ kernel functions $\{k(\cdot,x_i)\}_{i=1}^{10}$ to 10 KPs, where each KP is non-zeron on at most three points in $\{x_i\}_{i=1}^{10}$.
Figure 2: $\sum_i\phi_i(x^*_j)$ can be normalized to $1$ for any $x^*_j$, as KPs induced by $\boldsymbol{X}^*={ih}$ at different points have identical values.
Figure 3: Upper row: log of error decreases with number of iterations; lower row: error ratio $\|\boldsymbol{\varepsilon}^{(t)}\| /\|\boldsymbol{\varepsilon}^{(t)-1}\|$ is close to our lower bound
Figure 4: Experiments with Matérn-${1}/{2}$. Upper row: logarithm of the error for the four competing algorithms.. Middle row: the resulting prediction curves for KMG and Back-fitting compared to the target function and the underlying hidden function $\mathcal{G}_d$, when $\boldsymbol{X}_n$ is from a LHD. Lower row: the resulting prediction curves for KMG and Back-fitting compared to the target function and the underlying hidden function $\mathcal{G}_d$, when $\boldsymbol{X}_n$ is from a random design.
Figure 5: Experiments with Matérn-${3}/{2}$. Upper row: logarithm of the error for the four competing algorithms. Middle row: the resulting prediction curves for KMG and Back-fitting compared to the target function and the underlying hidden function $\mathcal{G}_d$, when $\boldsymbol{X}_n$ is from a LHD. Lower row: the resulting prediction curves for KMG and Back-fitting compared to the target function and the underlying hidden function $\mathcal{G}_d$, when $\boldsymbol{X}_n$ is from a random design.
...and 1 more figures

Theorems & Definitions (19)

Theorem 1
Proposition 2: Proposition 1 ding2022sample
Proposition 3
Proposition 4
Proposition 5
Theorem 6
Theorem 7: Approximation Property
Remark 8
Remark 9
Lemma 10: Smoothing Property
...and 9 more

Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

TL;DR

Abstract

Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (19)