An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

Shifan Zhao; Tianshi Xu; Hua Huang; Edmond Chow; Yuanzhe Xi

An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

Shifan Zhao, Tianshi Xu, Hua Huang, Edmond Chow, Yuanzhe Xi

TL;DR

The paper addresses the challenge of preconditioning large regularized kernel systems whose spectra vary with kernel parameters. It introduces the Adaptive Factorized Nyström (AFN) preconditioner, which combines a Nyström-based landmark block with a factorized sparse inverse of the Schur complement to achieve robustness across parameter regimes, and it adaptively selects the landmark size and rank. FPS landmark sampling and a subsampling-based rank estimator drive the adaptive mechanism, enabling scalable construction even when the Nyström rank is large. Numerical experiments on synthetic 3D data and ML datasets show AFN delivering near-constant iteration counts and reduced setup time relative to competing preconditioners, demonstrating practical impact for kernel methods under parameter variation.

Abstract

The spectrum of a kernel matrix significantly depends on the parameter values of the kernel function used to define the kernel matrix. This makes it challenging to design a preconditioner for a regularized kernel matrix that is robust across different parameter values. This paper proposes the Adaptive Factorized Nyström (AFN) preconditioner. The preconditioner is designed for the case where the rank k of the Nyström approximation is large, i.e., for kernel function parameters that lead to kernel matrices with eigenvalues that decay slowly. AFN deliberately chooses a well-conditioned submatrix to solve with and corrects a Nyström approximation with a factorized sparse approximate matrix inverse. This makes AFN efficient for kernel matrices with large numerical ranks. AFN also adaptively chooses the size of this submatrix to balance accuracy and cost.

An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

TL;DR

Abstract

Paper Structure (17 sections, 5 theorems, 59 equations, 5 figures, 3 tables, 6 algorithms)

This paper contains 17 sections, 5 theorems, 59 equations, 5 figures, 3 tables, 6 algorithms.

Introduction
Background: Nyström approximation
Adaptive Factorized Nyström (AFN) preconditioner
Screening effect and FSAI
AFN preconditioner construction and application
Adaptive choice of approximation rank
Selecting the landmark points
Interplay between fill and separation distance
Optimal properties of FPS
Nyström approximation error analysis based on fill distance
FPS and Screening Effect
Implementation of FPS
Numerical experiments
Experiments with synthetic 3D datasets
Experiments with machine learning datasets
...and 2 more sections

Key Result

Theorem 4.1

\newlabelthm:fill-separation worst case0 Suppose all the data points are inside a unit ball $\Omega$ in $\mathbb{R}^{d}$. Then for an arbitrary subset $X_{k} = \{\mathbf{x}_{i_1},\dots,\mathbf{x}_{i_k}\}$ of $X$, the following bounds hold for $h_{X_{k}}$ and $q_{X_{k}}$:

Figures (5)

Figure 1: Left: Spectrum of $61$ regularized Gaussian kernel matrices associated with the same $1000$ points sampled randomly over a cube with edge length $10$ and a fixed regularization parameter $\mu=0.0001$ but different length-scales $l$. Right: Iteration counts of unpreconditioned CG to solve Equation \ref{['eq:Problem']} for the $61$ regularized kernel matrices to reach the relative residual tolerance $10^{-4}$.
Figure 1: Comparison of the relative Nyström approximation error curves for an original dataset and a subsampled dataset with $100$ points, associated with two different length-scales. The original dataset contains 1000 uniformly sampled points from a cube with edge length $10$. The indices of the subsampled dataset are matched with those of the original dataset by computing the relative Nyström approximation errors on the original dataset only for ranks that are multiples of 10. The plot shows how the approximation error changes as the rank of the approximation increases.
Figure 1: An illustration of FPS for selecting one, ten, twenty and thirty points from a two-dimensional dataset with $400$ points where the big circles represent the selected points and the dots denote the other data points.
Figure 2: Comparison of fill distance and the Nyström approximation error for $1000$ points uniformly sampled from a cube with edge length $10$, when the Gaussian kernel function with length-scale $l = 10$ is used. FPS and random sampling are used to sample $k$ points from $X$ to form $X_k$. Nyström error is computed only for the ranks which are multiples of $10$.
Figure 3: Histograms of the magnitude of the entries in $\mathbf{K}_{22}+\mu\mathbf{I}$, $\mathbf{K}_{22} + \mu\,\mathbf{I} -\mathbf{K}_{12}^{\top}(\mathbf{K}_{11}+\mu\,\mathbf{I})^{-1}\mathbf{K}_{12}$, and $(\mathbf{K}_{22} + \mu\,\mathbf{I} -\mathbf{K}_{12}^{\top}(\mathbf{K}_{11}+\mu\,\mathbf{I})^{-1}\mathbf{K}_{12})^{-1}$ associated with a Gaussian kernel matrix defined using $1000$ points sampled uniformly from a cube with edge length $10$, regularization parameter $\mu=0.0001$, and length-scale $l=5$. The maximum entries in these three matrices are all scaled to $1$. $\mathbf{K}$ has $243$ eigenvalues greater than $1.1\times\mu$.

Theorems & Definitions (11)

Theorem 4.1
Proof 1
Remark 4.2
Theorem 4.3
Proof 2
Theorem 4.4
Proof 3
Theorem 4.5
Remark 4.6
Theorem A.1: belkin_approximation_2018
...and 1 more

An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

TL;DR

Abstract

An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (11)