Table of Contents
Fetching ...

Even Faster Kernel Matrix Linear Algebra via Density Estimation

Rikhav Shah, Sandeep Silwal, Haike Xu

TL;DR

This work develops subquadratic, KDE-based algorithms for core kernel-matrix tasks by accessing the kernel matrix only through fast KDE structures. The authors introduce a rapid non-negative MVP method with per-coordinate guarantees, and extend this to faster approximate matrix multiplication, top eigenvalue estimation via a refined noisy power method, and efficient computation of the kernel sum $s(K)$. They provide tighter upper bounds that improve over prior work while establishing conditional lower bounds (via SETH and OV reductions) that illuminate the limits of KDE-based approaches, especially for negative vectors or asymmetric kernel matrices. The results collectively advance the practical, theory-backed efficiency of kernel methods in high dimensions and large datasets, with clear implications for kernel alignment, spectral methods, and KDE-assisted linear algebra. The work also clarifies the trade-offs between accuracy, dimensionality, and sample complexity, pointing to open questions about the ultimate limits of subquadratic kernel computations.

Abstract

This paper studies the use of kernel density estimation (KDE) for linear algebraic tasks involving the kernel matrix of a collection of $n$ data points in $\mathbb R^d$. In particular, we improve upon existing algorithms for computing the following up to $(1+\varepsilon)$ relative error: matrix-vector products, matrix-matrix products, the spectral norm, and sum of all entries. The runtimes of our algorithms depend on the dimension $d$, the number of points $n$, and the target error $\varepsilon$. Importantly, the dependence on $n$ in each case is far lower when accessing the kernel matrix through KDE queries as opposed to reading individual entries. Our improvements over existing best algorithms (particularly those of Backurs, Indyk, Musco, and Wagner '21) for these tasks reduce the polynomial dependence on $\varepsilon$, and additionally decreases the dependence on $n$ in the case of computing the sum of all entries of the kernel matrix. We complement our upper bounds with several lower bounds for related problems, which provide (conditional) quadratic time hardness results and additionally hint at the limits of KDE based approaches for the problems we study.

Even Faster Kernel Matrix Linear Algebra via Density Estimation

TL;DR

This work develops subquadratic, KDE-based algorithms for core kernel-matrix tasks by accessing the kernel matrix only through fast KDE structures. The authors introduce a rapid non-negative MVP method with per-coordinate guarantees, and extend this to faster approximate matrix multiplication, top eigenvalue estimation via a refined noisy power method, and efficient computation of the kernel sum . They provide tighter upper bounds that improve over prior work while establishing conditional lower bounds (via SETH and OV reductions) that illuminate the limits of KDE-based approaches, especially for negative vectors or asymmetric kernel matrices. The results collectively advance the practical, theory-backed efficiency of kernel methods in high dimensions and large datasets, with clear implications for kernel alignment, spectral methods, and KDE-assisted linear algebra. The work also clarifies the trade-offs between accuracy, dimensionality, and sample complexity, pointing to open questions about the ultimate limits of subquadratic kernel computations.

Abstract

This paper studies the use of kernel density estimation (KDE) for linear algebraic tasks involving the kernel matrix of a collection of data points in . In particular, we improve upon existing algorithms for computing the following up to relative error: matrix-vector products, matrix-matrix products, the spectral norm, and sum of all entries. The runtimes of our algorithms depend on the dimension , the number of points , and the target error . Importantly, the dependence on in each case is far lower when accessing the kernel matrix through KDE queries as opposed to reading individual entries. Our improvements over existing best algorithms (particularly those of Backurs, Indyk, Musco, and Wagner '21) for these tasks reduce the polynomial dependence on , and additionally decreases the dependence on in the case of computing the sum of all entries of the kernel matrix. We complement our upper bounds with several lower bounds for related problems, which provide (conditional) quadratic time hardness results and additionally hint at the limits of KDE based approaches for the problems we study.

Paper Structure

This paper contains 24 sections, 31 theorems, 111 equations, 3 figures, 3 tables, 3 algorithms.

Key Result

Theorem 2.0

Let $k$ be a kernel admitting a KDE datastructure of a0. Let $K \in \mathbb{R}^{n\times n}$ be the associated kernel matrix for $n$ points in $d$ dimensions. There is an $\varepsilon$-non-negative approximate matrix-vector product algorithm (a25) satisfying a24 for $K$ running in time $\widetilde{O}

Figures (3)

  • Figure 1: Exponent of $n$ (left plot) and exponent of $\varepsilon$ (right plot) for the problem of computing $s(K)$ up to a $1+\varepsilon$ factor. For the left plot, note that there is never a reason to spend more than $\omega(n)$ time since a simple Chernoff bound calculation shows that sampling $\tilde{O}(n/\varepsilon^2)$ uniformly random entries also suffices to $1+\varepsilon$ approximate $s(K)$. Thus, we may always take the minimum of the exponent of b0 and $1$.
  • Figure 2: In \ref{['a32']}, we need to compute the sum $s_o(K_A)$, which can be broken down as $s_o(K_B) + s_o(K) + 2x$, where $x$ represents one of the pink rectangles.
  • Figure 3: Representation of $v^\top Kw$.

Theorems & Definitions (66)

  • Definition 1.1: Fast KDE
  • Definition 2.1: $\varepsilon$-Non-negative Matrix-Vector Product, b0
  • Theorem 2.0
  • Theorem 2.0
  • Theorem 2.0
  • Theorem 2.0
  • Theorem 2.0
  • Theorem 2.0
  • Theorem 2.0
  • Remark 3.1: Remark on KDE guarantees
  • ...and 56 more