Exploiting Hankel-Toeplitz Structures for Fast Computation of Kernel Precision Matrices
Frida Viset, Anton Kullberg, Frederiek Wesel, Arno Solin
TL;DR
The paper tackles the kernel-precision bottleneck in hyperparameter-optimized Gaussian Processes by exploiting Hankel–Toeplitz structures within basis-function expansions. It shows that the precision matrix, when assembled from per-dimension basis products, decomposes into multi-level Hankel/Toeplitz forms, yielding only $\prod_{d=1}^D (2m_d-1)$ or $\prod_{d=1}^D 3m_d$ unique entries and reducing the cost to $O(NM)$ with memory $O(M)$, all without additional approximations. Two theorems establish sufficient conditions for these reductions to hold across a wide class of approximations (including Variational Fourier Features), independent of the data. Empirical results on synthetic data, magnetic-field mapping, and US precipitation data demonstrate substantial speedups and memory savings, enabling much larger basis sets and higher-frequency kernel content in practice, with code available for reproduction.
Abstract
The Hilbert-space Gaussian Process (HGP) approach offers a hyperparameter-independent basis function approximation for speeding up Gaussian Process (GP) inference by projecting the GP onto M basis functions. These properties result in a favorable data-independent $\mathcal{O}(M^3)$ computational complexity during hyperparameter optimization but require a dominating one-time precomputation of the precision matrix costing $\mathcal{O}(NM^2)$ operations. In this paper, we lower this dominating computational complexity to $\mathcal{O}(NM)$ with no additional approximations. We can do this because we realize that the precision matrix can be split into a sum of Hankel-Toeplitz matrices, each having $\mathcal{O}(M)$ unique entries. Based on this realization we propose computing only these unique entries at $\mathcal{O}(NM)$ costs. Further, we develop two theorems that prescribe sufficient conditions for the complexity reduction to hold generally for a wide range of other approximate GP models, such as the Variational Fourier Feature (VFF) approach. The two theorems do this with no assumptions on the data and no additional approximations of the GP models themselves. Thus, our contribution provides a pure speed-up of several existing, widely used, GP approximations, without further approximations.
