Table of Contents
Fetching ...

Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nyström Method, and Use of Kernels in Machine Learning: Tutorial and Survey

Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

TL;DR

The paper surveys the theoretical foundations and practical tools of kernel methods, centered on Reproducing Kernel Hilbert Space (RKHS) and Mercer's theorem. It covers kernel definitions, feature mappings, and the spectrum of kernel properties (universality, stationarity, and characteristicness), as well as centering, normalization, and eigenfunction-based embeddings. Core methodological pillars include kernelization techniques (kernel trick and representation theory), kernel learning, and distribution embeddings through HSIC, MMD, and kernel mean embedding. A substantial portion is devoted to computational aspects, notably the Nyström method for scalable eigenfunction and kernel approximations, along with rank/factorization strategies and improvements. The exposition integrates these components to illuminate kernel-based approaches for dimensionality reduction, independence testing, distribution comparison, and kernel learning in high-dimensional settings.

Abstract

This is a tutorial and survey paper on kernels, kernel methods, and related fields. We start with reviewing the history of kernels in functional analysis and machine learning. Then, Mercer kernel, Hilbert and Banach spaces, Reproducing Kernel Hilbert Space (RKHS), Mercer's theorem and its proof, frequently used kernels, kernel construction from distance metric, important classes of kernels (including bounded, integrally positive definite, universal, stationary, and characteristic kernels), kernel centering and normalization, and eigenfunctions are explained in detail. Then, we introduce types of use of kernels in machine learning including kernel methods (such as kernel support vector machines), kernel learning by semi-definite programming, Hilbert-Schmidt independence criterion, maximum mean discrepancy, kernel mean embedding, and kernel dimensionality reduction. We also cover rank and factorization of kernel matrix as well as the approximation of eigenfunctions and kernels using the Nystr{ö}m method. This paper can be useful for various fields of science including machine learning, dimensionality reduction, functional analysis in mathematics, and mathematical physics in quantum mechanics.

Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nyström Method, and Use of Kernels in Machine Learning: Tutorial and Survey

TL;DR

The paper surveys the theoretical foundations and practical tools of kernel methods, centered on Reproducing Kernel Hilbert Space (RKHS) and Mercer's theorem. It covers kernel definitions, feature mappings, and the spectrum of kernel properties (universality, stationarity, and characteristicness), as well as centering, normalization, and eigenfunction-based embeddings. Core methodological pillars include kernelization techniques (kernel trick and representation theory), kernel learning, and distribution embeddings through HSIC, MMD, and kernel mean embedding. A substantial portion is devoted to computational aspects, notably the Nyström method for scalable eigenfunction and kernel approximations, along with rank/factorization strategies and improvements. The exposition integrates these components to illuminate kernel-based approaches for dimensionality reduction, independence testing, distribution comparison, and kernel learning in high-dimensional settings.

Abstract

This is a tutorial and survey paper on kernels, kernel methods, and related fields. We start with reviewing the history of kernels in functional analysis and machine learning. Then, Mercer kernel, Hilbert and Banach spaces, Reproducing Kernel Hilbert Space (RKHS), Mercer's theorem and its proof, frequently used kernels, kernel construction from distance metric, important classes of kernels (including bounded, integrally positive definite, universal, stationary, and characteristic kernels), kernel centering and normalization, and eigenfunctions are explained in detail. Then, we introduce types of use of kernels in machine learning including kernel methods (such as kernel support vector machines), kernel learning by semi-definite programming, Hilbert-Schmidt independence criterion, maximum mean discrepancy, kernel mean embedding, and kernel dimensionality reduction. We also cover rank and factorization of kernel matrix as well as the approximation of eigenfunctions and kernels using the Nystr{ö}m method. This paper can be useful for various fields of science including machine learning, dimensionality reduction, functional analysis in mathematics, and mathematical physics in quantum mechanics.

Paper Structure

This paper contains 56 sections, 29 theorems, 159 equations, 3 figures.

Key Result

Theorem 1

For a set of data $\mathcal{X} = \{\boldsymbol{x}_i\}_{i=1}^n$, consider a RKHS $\mathcal{H}$ of functions $f: \mathcal{X} \rightarrow \mathbb{R}$ with kernel function $k$. For any function $\ell: \mathbb{R}^2 \rightarrow \mathbb{R}$ (usually called the loss function), consider the optimization prob where $\eta \geq 0$ is the regularization parameter and $\Omega(\|f\|_k)$ is a penalty term such as

Figures (3)

  • Figure 1: Pulling data from the input space to the feature space (RKHS). The explicit locations of pulled points are not necessarily known but the relative similarity (inner product) of pulled data points is known in the feature space.
  • Figure 2: Centered pulled data the feature space (RKHS). This happens after kernel centering where the mean of cloud of pulled data becomes zero in RKHS. Even by kernel centering, the explicit locations of pulled points are not necessarily known because of not knowing the rotation of pulled data in that space.
  • Figure 3: Transforming data to RKHS using kernels to make the nonlinear pattern of data more linear. For example, here the classes have become linearly separable (by a linear hyperplane) after kernelization.

Theorems & Definitions (85)

  • Definition 1: Mercer Kernel mercer1909functions
  • Definition 2: Gram Matrix or Kernel Matrix
  • Definition 3: Metric Space
  • Definition 4: Vector Space
  • Definition 5: Complete Space
  • Definition 6: Compact Space
  • Definition 7: Hilbert Space reed1972methods
  • Definition 8: Banach Space beauzamy1982introduction
  • Remark 1: Difference of Hilbert and Banach Spaces
  • Definition 9: $L_p$ Space
  • ...and 75 more