Table of Contents
Fetching ...

An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

Shifan Zhao, Tianshi Xu, Hua Huang, Edmond Chow, Yuanzhe Xi

TL;DR

The paper addresses the challenge of preconditioning large regularized kernel systems whose spectra vary with kernel parameters. It introduces the Adaptive Factorized Nyström (AFN) preconditioner, which combines a Nyström-based landmark block with a factorized sparse inverse of the Schur complement to achieve robustness across parameter regimes, and it adaptively selects the landmark size and rank. FPS landmark sampling and a subsampling-based rank estimator drive the adaptive mechanism, enabling scalable construction even when the Nyström rank is large. Numerical experiments on synthetic 3D data and ML datasets show AFN delivering near-constant iteration counts and reduced setup time relative to competing preconditioners, demonstrating practical impact for kernel methods under parameter variation.

Abstract

The spectrum of a kernel matrix significantly depends on the parameter values of the kernel function used to define the kernel matrix. This makes it challenging to design a preconditioner for a regularized kernel matrix that is robust across different parameter values. This paper proposes the Adaptive Factorized Nyström (AFN) preconditioner. The preconditioner is designed for the case where the rank k of the Nyström approximation is large, i.e., for kernel function parameters that lead to kernel matrices with eigenvalues that decay slowly. AFN deliberately chooses a well-conditioned submatrix to solve with and corrects a Nyström approximation with a factorized sparse approximate matrix inverse. This makes AFN efficient for kernel matrices with large numerical ranks. AFN also adaptively chooses the size of this submatrix to balance accuracy and cost.

An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

TL;DR

The paper addresses the challenge of preconditioning large regularized kernel systems whose spectra vary with kernel parameters. It introduces the Adaptive Factorized Nyström (AFN) preconditioner, which combines a Nyström-based landmark block with a factorized sparse inverse of the Schur complement to achieve robustness across parameter regimes, and it adaptively selects the landmark size and rank. FPS landmark sampling and a subsampling-based rank estimator drive the adaptive mechanism, enabling scalable construction even when the Nyström rank is large. Numerical experiments on synthetic 3D data and ML datasets show AFN delivering near-constant iteration counts and reduced setup time relative to competing preconditioners, demonstrating practical impact for kernel methods under parameter variation.

Abstract

The spectrum of a kernel matrix significantly depends on the parameter values of the kernel function used to define the kernel matrix. This makes it challenging to design a preconditioner for a regularized kernel matrix that is robust across different parameter values. This paper proposes the Adaptive Factorized Nyström (AFN) preconditioner. The preconditioner is designed for the case where the rank k of the Nyström approximation is large, i.e., for kernel function parameters that lead to kernel matrices with eigenvalues that decay slowly. AFN deliberately chooses a well-conditioned submatrix to solve with and corrects a Nyström approximation with a factorized sparse approximate matrix inverse. This makes AFN efficient for kernel matrices with large numerical ranks. AFN also adaptively chooses the size of this submatrix to balance accuracy and cost.
Paper Structure (17 sections, 5 theorems, 59 equations, 5 figures, 3 tables, 6 algorithms)

This paper contains 17 sections, 5 theorems, 59 equations, 5 figures, 3 tables, 6 algorithms.

Key Result

Theorem 4.1

\newlabelthm:fill-separation worst case0 Suppose all the data points are inside a unit ball $\Omega$ in $\mathbb{R}^{d}$. Then for an arbitrary subset $X_{k} = \{\mathbf{x}_{i_1},\dots,\mathbf{x}_{i_k}\}$ of $X$, the following bounds hold for $h_{X_{k}}$ and $q_{X_{k}}$:

Figures (5)

  • Figure 1: Left: Spectrum of $61$ regularized Gaussian kernel matrices associated with the same $1000$ points sampled randomly over a cube with edge length $10$ and a fixed regularization parameter $\mu=0.0001$ but different length-scales $l$. Right: Iteration counts of unpreconditioned CG to solve Equation \ref{['eq:Problem']} for the $61$ regularized kernel matrices to reach the relative residual tolerance $10^{-4}$.
  • Figure 1: Comparison of the relative Nyström approximation error curves for an original dataset and a subsampled dataset with $100$ points, associated with two different length-scales. The original dataset contains 1000 uniformly sampled points from a cube with edge length $10$. The indices of the subsampled dataset are matched with those of the original dataset by computing the relative Nyström approximation errors on the original dataset only for ranks that are multiples of 10. The plot shows how the approximation error changes as the rank of the approximation increases.
  • Figure 1: An illustration of FPS for selecting one, ten, twenty and thirty points from a two-dimensional dataset with $400$ points where the big circles represent the selected points and the dots denote the other data points.
  • Figure 2: Comparison of fill distance and the Nyström approximation error for $1000$ points uniformly sampled from a cube with edge length $10$, when the Gaussian kernel function with length-scale $l = 10$ is used. FPS and random sampling are used to sample $k$ points from $X$ to form $X_k$. Nyström error is computed only for the ranks which are multiples of $10$.
  • Figure 3: Histograms of the magnitude of the entries in $\mathbf{K}_{22}+\mu\mathbf{I}$, $\mathbf{K}_{22} + \mu\,\mathbf{I} -\mathbf{K}_{12}^{\top}(\mathbf{K}_{11}+\mu\,\mathbf{I})^{-1}\mathbf{K}_{12}$, and $(\mathbf{K}_{22} + \mu\,\mathbf{I} -\mathbf{K}_{12}^{\top}(\mathbf{K}_{11}+\mu\,\mathbf{I})^{-1}\mathbf{K}_{12})^{-1}$ associated with a Gaussian kernel matrix defined using $1000$ points sampled uniformly from a cube with edge length $10$, regularization parameter $\mu=0.0001$, and length-scale $l=5$. The maximum entries in these three matrices are all scaled to $1$. $\mathbf{K}$ has $243$ eigenvalues greater than $1.1\times\mu$.

Theorems & Definitions (11)

  • Theorem 4.1
  • Proof 1
  • Remark 4.2
  • Theorem 4.3
  • Proof 2
  • Theorem 4.4
  • Proof 3
  • Theorem 4.5
  • Remark 4.6
  • Theorem A.1: belkin_approximation_2018
  • ...and 1 more