Table of Contents
Fetching ...

Exploiting Hankel-Toeplitz Structures for Fast Computation of Kernel Precision Matrices

Frida Viset, Anton Kullberg, Frederiek Wesel, Arno Solin

TL;DR

The paper tackles the kernel-precision bottleneck in hyperparameter-optimized Gaussian Processes by exploiting Hankel–Toeplitz structures within basis-function expansions. It shows that the precision matrix, when assembled from per-dimension basis products, decomposes into multi-level Hankel/Toeplitz forms, yielding only $\prod_{d=1}^D (2m_d-1)$ or $\prod_{d=1}^D 3m_d$ unique entries and reducing the cost to $O(NM)$ with memory $O(M)$, all without additional approximations. Two theorems establish sufficient conditions for these reductions to hold across a wide class of approximations (including Variational Fourier Features), independent of the data. Empirical results on synthetic data, magnetic-field mapping, and US precipitation data demonstrate substantial speedups and memory savings, enabling much larger basis sets and higher-frequency kernel content in practice, with code available for reproduction.

Abstract

The Hilbert-space Gaussian Process (HGP) approach offers a hyperparameter-independent basis function approximation for speeding up Gaussian Process (GP) inference by projecting the GP onto M basis functions. These properties result in a favorable data-independent $\mathcal{O}(M^3)$ computational complexity during hyperparameter optimization but require a dominating one-time precomputation of the precision matrix costing $\mathcal{O}(NM^2)$ operations. In this paper, we lower this dominating computational complexity to $\mathcal{O}(NM)$ with no additional approximations. We can do this because we realize that the precision matrix can be split into a sum of Hankel-Toeplitz matrices, each having $\mathcal{O}(M)$ unique entries. Based on this realization we propose computing only these unique entries at $\mathcal{O}(NM)$ costs. Further, we develop two theorems that prescribe sufficient conditions for the complexity reduction to hold generally for a wide range of other approximate GP models, such as the Variational Fourier Feature (VFF) approach. The two theorems do this with no assumptions on the data and no additional approximations of the GP models themselves. Thus, our contribution provides a pure speed-up of several existing, widely used, GP approximations, without further approximations.

Exploiting Hankel-Toeplitz Structures for Fast Computation of Kernel Precision Matrices

TL;DR

The paper tackles the kernel-precision bottleneck in hyperparameter-optimized Gaussian Processes by exploiting Hankel–Toeplitz structures within basis-function expansions. It shows that the precision matrix, when assembled from per-dimension basis products, decomposes into multi-level Hankel/Toeplitz forms, yielding only or unique entries and reducing the cost to with memory , all without additional approximations. Two theorems establish sufficient conditions for these reductions to hold across a wide class of approximations (including Variational Fourier Features), independent of the data. Empirical results on synthetic data, magnetic-field mapping, and US precipitation data demonstrate substantial speedups and memory savings, enabling much larger basis sets and higher-frequency kernel content in practice, with code available for reproduction.

Abstract

The Hilbert-space Gaussian Process (HGP) approach offers a hyperparameter-independent basis function approximation for speeding up Gaussian Process (GP) inference by projecting the GP onto M basis functions. These properties result in a favorable data-independent computational complexity during hyperparameter optimization but require a dominating one-time precomputation of the precision matrix costing operations. In this paper, we lower this dominating computational complexity to with no additional approximations. We can do this because we realize that the precision matrix can be split into a sum of Hankel-Toeplitz matrices, each having unique entries. Based on this realization we propose computing only these unique entries at costs. Further, we develop two theorems that prescribe sufficient conditions for the complexity reduction to hold generally for a wide range of other approximate GP models, such as the Variational Fourier Feature (VFF) approach. The two theorems do this with no assumptions on the data and no additional approximations of the GP models themselves. Thus, our contribution provides a pure speed-up of several existing, widely used, GP approximations, without further approximations.
Paper Structure (23 sections, 6 theorems, 49 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 6 theorems, 49 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.1

If the matrix is a Hankel or Toeplitz matrix for all $x^{(d)}\in\mathbb{R}$ along each dimension $d$, the information matrix $\bm{\Phi}^\top\bm{\Phi}$ will be a multi-level block Hankel or Toeplitz matrix, and therefore have ${\prod_{d=1}^D(2m_d-1)}$ unique entries.

Figures (8)

  • Figure 1: An order of magnitude speed-up without any additional approximations: Wall-clock time to compute the precision matrix for an increasing number $M$ of basis functions.
  • Figure 2: The precision matrix for polynomial basis functions has a nested Hankel structure. The visualization of the matrix is proportionally darker as the logarithm of each entry increases. The matrices are computed as the sum of all entries $\bm{H}_n$ for $n=\{1,\hdots, N\}$, where the expression for $\bm{H}_n$ is given below each matrix.
  • Figure 3: The precision matrix for sinusoidal basis functions in one dimension has neither Hankel nor Toeplitz structure. However, it can be decomposed into a sum of two matrices, where one has a Hankel structure, and one has Toeplitz structure. Here, 49 are placed along one dimension.
  • Figure 4: The precision matrix for sinusoidal in two dimensions has neither Hankel nor Toeplitz structure. However, it can be decomposed into $2^D=4$ matrices, which each have block Hankel--Toeplitz structure. Here, 7 are placed along each of the two dimensions, giving a total of 49 .
  • Figure 5: Our proposed computational scheme reduces the computation time for datasets with high-frequency variations, as these require many to achieve accurate reconstruction. This underwater magnetic field has lower with a large amount ($6400$) compared to a smaller amount ($400$) of . For $6400$ , our computational scheme reduced the required time to compute the precision matrix from 2.7 hours to 1.7 minutes.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Theorem 3.1
  • proof
  • Corollary 3.2
  • proof
  • Corollary 3.3
  • proof
  • Theorem 3.4
  • proof
  • Corollary 3.5
  • proof
  • ...and 2 more