Stochastic trace estimation for parameter-dependent matrices applied to spectral density approximation
Fabio Matti, Haoze He, Daniel Kressner, Hei Yin Lam
TL;DR
This work extends stochastic trace estimation to parameter-dependent matrices by enforcing constant randomness across the parameter, enabling reuse of matrix-vector products when evaluating Tr(B(t)) for many t. It analyzes three estimators—Girard-Hutchinson, Nyström, and Nyström++—and couples them with Chebyshev approximation to form the Chebyshev-Nyström++ method for efficient spectral density estimation. The authors prove L1-error bounds parallel to the constant-matrix case, show that Nyström++ achieves O(1/epsilon) work independent of low-rank structure, and integrate nonnegativity-preserving strategies for kernel approximations. Through numerical experiments in electronic structure, neural networks, and quantum physics, the approach demonstrates accurate spectral densities with favorable computational scaling and practical robustness across diverse applications.
Abstract
Stochastic trace estimation is a well-established tool for approximating the trace of a large symmetric matrix $\mathbf{B}$. Several applications involve a matrix that depends continuously on a parameter $t \in [a,b]$, and require trace estimates of $\mathbf{B}(t)$ for many values of $t$. This is, for example, the case when approximating the spectral density of a matrix. Approximating the trace separately for each matrix $\mathbf{B}(t_1), \dots, \mathbf{B}(t_m)$ clearly incurs redundancies and a cost that scales linearly with $m$. To address this issue, we propose and analyze modifications for three stochastic trace estimators, the Girard-Hutchinson, Nyström, and Nyström++ estimators. Our modification uses \emph{constant} randomization across different values of $t$, that is, every matrix $\mathbf{B}(t_1), \dots, \mathbf{B}(t_m)$ is multiplied with the \emph{same} set of random vectors. When combined with Chebyshev approximation in $t$, the use of such constant random matrices allows one to reuse matrix-vector products across different values of $t$, leading to significant cost reduction. Our analysis shows that the loss of stochastic independence across different $t$ does not lead to deterioration. In particular, we show that $\mathcal{O}(\varepsilon^{-1})$ random matrix-vector products suffice to ensure an error of $\varepsilon > 0$ for Nyström++, independent of low-rank properties of $\mathbf{B}(t)$. We discuss in detail how the combination of Nyström++ with Chebyshev approximation applies to spectral density estimation and provide an analysis of the resulting method. This improves various aspects of an existing stochastic estimator for spectral density estimation. Several numerical experiments from electronic structure interaction, statistical thermodynamics, and neural network optimization validate our findings.
