Table of Contents
Fetching ...

Numerical Stability of the Nyström Method

Alberto Bucci, Yuji Nakatsukasa, Taejun Park

TL;DR

This work addresses the numerical instability of the Nyström method when forming low-rank kernel approximations by introducing an epsilon-truncated pseudoinverse within a stabilized Nyström (SN) framework that uses locally max-vol index sets for column selection. The authors establish stability guarantees in exact arithmetic and prove backward-stability under floating-point computation, showing the error depends on the spectral decay σ_{r+1}(A) and the truncation tolerance ε rather than the condition number of A. They demonstrate through extensive experiments that SN provides robust, structure-preserving, and scalable performance, often outperforming shifting-based stabilization and avoiding QR computations. The results offer practical guidance for stable large-scale kernel computations, enabling potential low-precision gains without sacrificing accuracy in many applications.

Abstract

The Nyström method is a widely used technique for improving the scalability of kernel-based algorithms, including kernel ridge regression, spectral clustering, and Gaussian processes. Despite its popularity, the numerical stability of the method has remained largely an unresolved problem. In particular, the pseudo-inversion of the submatrix involved in the Nyström method may pose stability issues as the submatrix is likely to be ill-conditioned, resulting in numerically poor approximation. In this work, we establish conditions under which the Nyström method is numerically stable. We show that stability can be achieved through an appropriate choice of column subsets and a careful implementation of the pseudoinverse. Our results and experiments provide theoretical justification and practical guidance for the stable application of the Nyström method in large-scale kernel computations.

Numerical Stability of the Nyström Method

TL;DR

This work addresses the numerical instability of the Nyström method when forming low-rank kernel approximations by introducing an epsilon-truncated pseudoinverse within a stabilized Nyström (SN) framework that uses locally max-vol index sets for column selection. The authors establish stability guarantees in exact arithmetic and prove backward-stability under floating-point computation, showing the error depends on the spectral decay σ_{r+1}(A) and the truncation tolerance ε rather than the condition number of A. They demonstrate through extensive experiments that SN provides robust, structure-preserving, and scalable performance, often outperforming shifting-based stabilization and avoiding QR computations. The results offer practical guidance for stable large-scale kernel computations, enabling potential low-precision gains without sacrificing accuracy in many applications.

Abstract

The Nyström method is a widely used technique for improving the scalability of kernel-based algorithms, including kernel ridge regression, spectral clustering, and Gaussian processes. Despite its popularity, the numerical stability of the method has remained largely an unresolved problem. In particular, the pseudo-inversion of the submatrix involved in the Nyström method may pose stability issues as the submatrix is likely to be ill-conditioned, resulting in numerically poor approximation. In this work, we establish conditions under which the Nyström method is numerically stable. We show that stability can be achieved through an appropriate choice of column subsets and a careful implementation of the pseudoinverse. Our results and experiments provide theoretical justification and practical guidance for the stable application of the Nyström method in large-scale kernel computations.

Paper Structure

This paper contains 14 sections, 10 theorems, 74 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Lemma 3.1

Let $S\in \mathbb{R}^{n\times r}$ be a locally max-vol subsampling matrix for a SPSD matrix $A \in \mathbb{R}^{n\times n}$ corresponding to the indices $J$. Then where $Q$ is the orthonormal factor in the thin QR decomposition of $AS$.

Figures (4)

  • Figure 1: All the different implementations appear to be numerically stable except for the $\texttt{pinv}$ implementation.
  • Figure 2: Relative error in Frobenius norm versus target rank $r$ for kernels with $\sigma = 3$. Left (ijcnn1): all four Nyström variants achieve virtually identical accuracy. Right (skin_nonskin): the kernel becomes ill‑conditioned as $r$ grows; the plain method fails beyond $r=240$, and the stabilized method outperforms the shifted variant by roughly two orders of magnitude, while the backslash implementation lies in between the two.
  • Figure 3: Relative approximation error versus rank $r$ for kernels with $\sigma = 30\sqrt{d}$. Top left (ijcnn1): all four methods again coincide. The remaining panels show the behavior for skin_nonskin, cod-rna, and cadata: the plain method fails at moderate ranks due to breakdown in the Cholesky factorization, while the truncated method consistently achieves lower error than the shifted variant for larger target rank. The behaviour for the backslash implementation is more irregular.
  • Figure 4: Relative error in Frobenius norm versus target rank $r$ for kernels with $\sigma = 30\sqrt{d}$ using uniform sampling. In both datasets, a9a and phishing, the plain and backslash implementations fail or become unstable, while the shift and truncated versions yield similar accuracy throughout.

Theorems & Definitions (21)

  • Definition 2.1
  • Lemma 3.1
  • Proof 1
  • Lemma 3.2
  • Proof 2
  • Lemma 3.3
  • Proof 3
  • Lemma 3.4
  • Proof 4
  • Theorem 3.5: Accuracy of Stabilized Nyström
  • ...and 11 more