Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nyström Method, and Use of Kernels in Machine Learning: Tutorial and Survey
Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley
TL;DR
The paper surveys the theoretical foundations and practical tools of kernel methods, centered on Reproducing Kernel Hilbert Space (RKHS) and Mercer's theorem. It covers kernel definitions, feature mappings, and the spectrum of kernel properties (universality, stationarity, and characteristicness), as well as centering, normalization, and eigenfunction-based embeddings. Core methodological pillars include kernelization techniques (kernel trick and representation theory), kernel learning, and distribution embeddings through HSIC, MMD, and kernel mean embedding. A substantial portion is devoted to computational aspects, notably the Nyström method for scalable eigenfunction and kernel approximations, along with rank/factorization strategies and improvements. The exposition integrates these components to illuminate kernel-based approaches for dimensionality reduction, independence testing, distribution comparison, and kernel learning in high-dimensional settings.
Abstract
This is a tutorial and survey paper on kernels, kernel methods, and related fields. We start with reviewing the history of kernels in functional analysis and machine learning. Then, Mercer kernel, Hilbert and Banach spaces, Reproducing Kernel Hilbert Space (RKHS), Mercer's theorem and its proof, frequently used kernels, kernel construction from distance metric, important classes of kernels (including bounded, integrally positive definite, universal, stationary, and characteristic kernels), kernel centering and normalization, and eigenfunctions are explained in detail. Then, we introduce types of use of kernels in machine learning including kernel methods (such as kernel support vector machines), kernel learning by semi-definite programming, Hilbert-Schmidt independence criterion, maximum mean discrepancy, kernel mean embedding, and kernel dimensionality reduction. We also cover rank and factorization of kernel matrix as well as the approximation of eigenfunctions and kernels using the Nystr{ö}m method. This paper can be useful for various fields of science including machine learning, dimensionality reduction, functional analysis in mathematics, and mathematical physics in quantum mechanics.
