Table of Contents
Fetching ...

Parametric Hierarchical Matrix Approximations to Kernel Matrices

Abraham Khan, Chao Chen, Vishwas Rao, Arvind K. Saibaba

TL;DR

This work addresses the high cost of repeatedly forming kernel matrix approximations across varying hyperparameters. It introduces parametric H- and H2-matrices, built on Chebyshev interpolation and tensor-train compression (TT, TT-cross), enabling an offline stage that precomputes a parametric representation over the parameter space and an online stage that instantaneously instantiates a kernel matrix for a fixed parameter without extra kernel evaluations. The key contributions are the TT-based parametric far-field and near-field approximations (via PTTK for far-field and a TT-augmented near-field scheme), detailed cost analyses showing offline $O(n\, ext{log}n)$ or $O(n)$ behavior and linear-time online MVMs, and extensive numerical experiments demonstrating 100x+ speedups across multiple kernels and parameter ranges. The results indicate practical, scalable, and accurate kernel-matrix approximations suitable for GP-based learning and inverse problems, with strong potential for translation-invariant kernels and large-scale datasets. Overall, the proposed parametric hierarchical matrices offer a robust framework for efficient parameter-dependent kernel computations in scientific computing and machine learning contexts.

Abstract

Kernel matrices are ubiquitous in computational mathematics, often arising from applications in machine learning and scientific computing. In two or three spatial or feature dimensions, such problems can be approximated efficiently by a class of matrices known as hierarchical matrices. A hierarchical matrix consists of a hierarchy of small near-field blocks (or sub-matrices) stored in a dense format and large far-field blocks approximated by low-rank matrices. Standard methods for forming hierarchical matrices do not account for the fact that kernel matrices depend on specific hyperparameters; for example, in the context of Gaussian processes, hyperparameters must be optimized over a fixed parameter space. We introduce a new class of hierarchical matrices, namely, parametric (parameter-dependent) hierarchical matrices. Members of this new class are parametric $\mathcal{H}$-matrices and parametric $\mathcal{H}^{2}$-matrices. The construction of a parametric hierarchical matrix follows an offline-online paradigm. In the offline stage, the near-field and far-field blocks are approximated by using polynomial approximation and tensor compression. In the online stage, for a particular hyperparameter, the parametric hierarchical matrix is instantiated efficiently as a standard hierarchical matrix. The asymptotic costs for storage and computation in the offline stage are comparable to the corresponding standard approaches of forming a hierarchical matrix. However, the online stage of our approach requires no new kernel evaluations, and the far-field blocks can be computed more efficiently than standard approaches. {Numerical experiments show over $100\times$ speedups compared with existing techniques.}

Parametric Hierarchical Matrix Approximations to Kernel Matrices

TL;DR

This work addresses the high cost of repeatedly forming kernel matrix approximations across varying hyperparameters. It introduces parametric H- and H2-matrices, built on Chebyshev interpolation and tensor-train compression (TT, TT-cross), enabling an offline stage that precomputes a parametric representation over the parameter space and an online stage that instantaneously instantiates a kernel matrix for a fixed parameter without extra kernel evaluations. The key contributions are the TT-based parametric far-field and near-field approximations (via PTTK for far-field and a TT-augmented near-field scheme), detailed cost analyses showing offline or behavior and linear-time online MVMs, and extensive numerical experiments demonstrating 100x+ speedups across multiple kernels and parameter ranges. The results indicate practical, scalable, and accurate kernel-matrix approximations suitable for GP-based learning and inverse problems, with strong potential for translation-invariant kernels and large-scale datasets. Overall, the proposed parametric hierarchical matrices offer a robust framework for efficient parameter-dependent kernel computations in scientific computing and machine learning contexts.

Abstract

Kernel matrices are ubiquitous in computational mathematics, often arising from applications in machine learning and scientific computing. In two or three spatial or feature dimensions, such problems can be approximated efficiently by a class of matrices known as hierarchical matrices. A hierarchical matrix consists of a hierarchy of small near-field blocks (or sub-matrices) stored in a dense format and large far-field blocks approximated by low-rank matrices. Standard methods for forming hierarchical matrices do not account for the fact that kernel matrices depend on specific hyperparameters; for example, in the context of Gaussian processes, hyperparameters must be optimized over a fixed parameter space. We introduce a new class of hierarchical matrices, namely, parametric (parameter-dependent) hierarchical matrices. Members of this new class are parametric -matrices and parametric -matrices. The construction of a parametric hierarchical matrix follows an offline-online paradigm. In the offline stage, the near-field and far-field blocks are approximated by using polynomial approximation and tensor compression. In the online stage, for a particular hyperparameter, the parametric hierarchical matrix is instantiated efficiently as a standard hierarchical matrix. The asymptotic costs for storage and computation in the offline stage are comparable to the corresponding standard approaches of forming a hierarchical matrix. However, the online stage of our approach requires no new kernel evaluations, and the far-field blocks can be computed more efficiently than standard approaches. {Numerical experiments show over speedups compared with existing techniques.}

Paper Structure

This paper contains 91 sections, 3 theorems, 106 equations, 6 figures, 14 tables, 10 algorithms.

Key Result

Lemma 1

Let $\sigma \in \mathcal{T}_{I}$ such that $\sigma$ is not a leaf node and $\sigma' \in \text{children}(\sigma)$. For the factor matrices $\{\boldsymbol{E}_{\sigma', i} \}_{i=1}^{d}$ defined in Section sssec:Transfer_Matrices, the following statement holds,

Figures (6)

  • Figure 1: Partitioning of the domain $B$ by recursively dividing it into $4^l$ uniformly hypercubes (squares) at levels $l = 0,1,2$.
  • Figure 2: For $d = 2$ and $\eta = \sqrt{d}$, boxes that are admissible with box $B_{\sigma}$, where $\sigma \in \mathcal{T}_{I}$, are colored green, while inadmissible boxes are colored red.
  • Figure 3: where $\boldsymbol{\theta} \in \Theta$, $d= 1$, $l_{\max} = 3$. The diagram illustrates a parametric $\mathcal{H}$-matrix approximation of $\boldsymbol{K}(X, X; \boldsymbol{\theta}).$ The yellow blocks are the parametric sub-matrices associated with the near-field block clusters, and the green blocks are the parametric sub-matrices associated with the far-field block clusters. The red blocks and dark blue blocks represent the sub-matrices of the parametric kernel matrix itself for the near-field and far-field block clusters, respectively.
  • Figure 4: Online time (NF Time + FF Time) of the parametric $\mathcal{H}$-matrix method vs $n$ for various kernels from Table \ref{['tab:kern-table']} in log-log scale.
  • Figure 5: Online time comparison between the parametric $\mathcal{H}$-matrix method and the $\mathcal{H}$-ACA method. The speedup factor is the ratio of the online time of the $\mathcal{H}$-ACA method to the online time of the parametric $\mathcal{H}$-matrix method. The far-field speedup is defined analogously. Both plots use a log-log scale.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 1: Cluster Tree
  • Definition 2: Parametric $\mathcal{H}$-matrix
  • Definition 3: Parametric $\mathcal{H}^{2}$-matrix
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof