Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences
Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, Bharath K Sriperumbudur
TL;DR
The paper surveys how Bayesian GP methods and frequentist RKHS kernel approaches are interconnected, highlighting exact correspondences such as GP posterior mean matching kernel ridge regression and GP posterior variance aligning with RKHS-based worst-case errors. It analyzes when GP draws lie in RKHSs, using spectral representations, Driscoll’s zero-one law, and the concept of RKHS powers to elucidate shared and distinct aspects of hypothesis spaces. It then links these ideas to convergence rates, integral transforms, and numerical methods, showing that regularization in KRR parallels additive noise in GP regression and that many kernel-based tools (MMD, HSIC, kernel quadrature, Bayesian quadrature) admit GP interpretations. The synthesis demonstrates that probabilistic and functional-analytic perspectives are not only compatible but mutually informative, enabling transfer of results and methods across Bayesian and frequentist kernels. Overall, the work provides a cohesive modern view of the deep connections between GP methods and RKHS kernel techniques with implications for theory and practice in statistical learning and numerical analysis.
Abstract
This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.
