Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition

Yunqin Zhu; Henry Shaowu Yuchi; Yao Xie

Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition

Yunqin Zhu, Henry Shaowu Yuchi, Yao Xie

TL;DR

This paper addresses the need for expressive yet scalable Gaussian process kernels by introducing the Deep Basis Kernel (DBK), a fully data-driven, low-rank kernel representation built from neural basis functions via Mercer's theorem. By construction, DBK supports exact GP inference in linear time without inducing points and enables scalable weight-space variational training for large datasets, complemented by a variance-correction procedure to guard against overconfident uncertainty. The authors demonstrate that DBK achieves improved predictive accuracy and better uncertainty calibration compared with full GP, sparse GP, and deep kernel learning variants across synthetic and real-world regression tasks, while delivering strong computational efficiency. The work provides a cohesive framework that unifies exact and variational inference for scalable, data-driven kernels, with practical impact on large-scale GP applications.

Abstract

Kernels are key to encoding prior beliefs and data structures in Gaussian process (GP) models. The design of expressive and scalable kernels has garnered significant research attention. Deep kernel learning enhances kernel flexibility by feeding inputs through a neural network before applying a standard parametric form. However, this approach remains limited by the choice of base kernels, inherits high inference costs, and often demands sparse approximations. Drawing on Mercer's theorem, we introduce a fully data-driven, scalable deep kernel representation where a neural network directly represents a low-rank kernel through a small set of basis functions. This construction enables highly efficient exact GP inference in linear time and memory without invoking inducing points. It also supports scalable mini-batch training based on a principled variational inference framework. We further propose a simple variance correction procedure to guard against overconfidence in uncertainty estimates. Experiments on synthetic and real-world data demonstrate the advantages of our deep kernel GP in terms of predictive accuracy, uncertainty quantification, and computational efficiency.

Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition

TL;DR

Abstract

Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)