Table of Contents
Fetching ...

Stochastic trace estimation for parameter-dependent matrices applied to spectral density approximation

Fabio Matti, Haoze He, Daniel Kressner, Hei Yin Lam

TL;DR

This work extends stochastic trace estimation to parameter-dependent matrices by enforcing constant randomness across the parameter, enabling reuse of matrix-vector products when evaluating Tr(B(t)) for many t. It analyzes three estimators—Girard-Hutchinson, Nyström, and Nyström++—and couples them with Chebyshev approximation to form the Chebyshev-Nyström++ method for efficient spectral density estimation. The authors prove L1-error bounds parallel to the constant-matrix case, show that Nyström++ achieves O(1/epsilon) work independent of low-rank structure, and integrate nonnegativity-preserving strategies for kernel approximations. Through numerical experiments in electronic structure, neural networks, and quantum physics, the approach demonstrates accurate spectral densities with favorable computational scaling and practical robustness across diverse applications.

Abstract

Stochastic trace estimation is a well-established tool for approximating the trace of a large symmetric matrix $\mathbf{B}$. Several applications involve a matrix that depends continuously on a parameter $t \in [a,b]$, and require trace estimates of $\mathbf{B}(t)$ for many values of $t$. This is, for example, the case when approximating the spectral density of a matrix. Approximating the trace separately for each matrix $\mathbf{B}(t_1), \dots, \mathbf{B}(t_m)$ clearly incurs redundancies and a cost that scales linearly with $m$. To address this issue, we propose and analyze modifications for three stochastic trace estimators, the Girard-Hutchinson, Nyström, and Nyström++ estimators. Our modification uses \emph{constant} randomization across different values of $t$, that is, every matrix $\mathbf{B}(t_1), \dots, \mathbf{B}(t_m)$ is multiplied with the \emph{same} set of random vectors. When combined with Chebyshev approximation in $t$, the use of such constant random matrices allows one to reuse matrix-vector products across different values of $t$, leading to significant cost reduction. Our analysis shows that the loss of stochastic independence across different $t$ does not lead to deterioration. In particular, we show that $\mathcal{O}(\varepsilon^{-1})$ random matrix-vector products suffice to ensure an error of $\varepsilon > 0$ for Nyström++, independent of low-rank properties of $\mathbf{B}(t)$. We discuss in detail how the combination of Nyström++ with Chebyshev approximation applies to spectral density estimation and provide an analysis of the resulting method. This improves various aspects of an existing stochastic estimator for spectral density estimation. Several numerical experiments from electronic structure interaction, statistical thermodynamics, and neural network optimization validate our findings.

Stochastic trace estimation for parameter-dependent matrices applied to spectral density approximation

TL;DR

This work extends stochastic trace estimation to parameter-dependent matrices by enforcing constant randomness across the parameter, enabling reuse of matrix-vector products when evaluating Tr(B(t)) for many t. It analyzes three estimators—Girard-Hutchinson, Nyström, and Nyström++—and couples them with Chebyshev approximation to form the Chebyshev-Nyström++ method for efficient spectral density estimation. The authors prove L1-error bounds parallel to the constant-matrix case, show that Nyström++ achieves O(1/epsilon) work independent of low-rank structure, and integrate nonnegativity-preserving strategies for kernel approximations. Through numerical experiments in electronic structure, neural networks, and quantum physics, the approach demonstrates accurate spectral densities with favorable computational scaling and practical robustness across diverse applications.

Abstract

Stochastic trace estimation is a well-established tool for approximating the trace of a large symmetric matrix . Several applications involve a matrix that depends continuously on a parameter , and require trace estimates of for many values of . This is, for example, the case when approximating the spectral density of a matrix. Approximating the trace separately for each matrix clearly incurs redundancies and a cost that scales linearly with . To address this issue, we propose and analyze modifications for three stochastic trace estimators, the Girard-Hutchinson, Nyström, and Nyström++ estimators. Our modification uses \emph{constant} randomization across different values of , that is, every matrix is multiplied with the \emph{same} set of random vectors. When combined with Chebyshev approximation in , the use of such constant random matrices allows one to reuse matrix-vector products across different values of , leading to significant cost reduction. Our analysis shows that the loss of stochastic independence across different does not lead to deterioration. In particular, we show that random matrix-vector products suffice to ensure an error of for Nyström++, independent of low-rank properties of . We discuss in detail how the combination of Nyström++ with Chebyshev approximation applies to spectral density estimation and provide an analysis of the resulting method. This improves various aspects of an existing stochastic estimator for spectral density estimation. Several numerical experiments from electronic structure interaction, statistical thermodynamics, and neural network optimization validate our findings.

Paper Structure

This paper contains 33 sections, 11 theorems, 64 equations, 9 figures, 1 table.

Key Result

theorem 1

Girard-Hutchinson estimator for parameter-dependent matriceshutchinson For a symmetric matrix $\boldsymbol{B}(t) \in \mathbb{R}^{n \times n}$ that depends continuously on $t \in [a, b]$, consider the Girard-Hutchinson estimator equ:hutchinson-trace-estimator with an $n\times n_{\boldsymbol{\Psi}}$ G In particular, given $\varepsilon \in (0, 1)$ and $\delta \in (0, e^{-1}]$, the bound $\int_{a}^{b}

Figures (9)

  • Figure 3.1: Accuracy when computing the estimate \ref{['equ:cyclic-property']} in three different ways. Inconsistent: Applying an auxiliary truncated Chebyshev series to $g_{\sigma}^2$, as proposed in lin-2017-randomized-estimation. Consistent: Squaring the Chebyshev approximation $g_{\sigma}^{(m)}$ using \ref{['alg:chebyshev-squaring']}. Non-negative consistent: Squaring the non-negative Chebyshev approximation $\underline{g}_{\sigma}^{(m)}$ with \ref{['alg:chebyshev-squaring']}. As input matrix we use the $100 \times 100$ matrix $\boldsymbol{A}$ coming from a two-dimensional finite difference discretization of the Hamiltonian defined in \ref{['equ:electronic-hamiltonian']} on a $10 \times 10$ grid. We fix $n_{\boldsymbol{\Omega}} = 80$ to make sure the low-rank approximation is accurate, set $\sigma = 0.005$, and compute the $L^1$ error from the approximation of the spectral density at $n_t = 100$ uniformly spaced values of $t$ in the interval $[-1, 1]$.
  • Figure 3.2: The difference the non-zero check can make when approximating a spectral density using \ref{['alg:nystrom-chebyshev-pp']}. As input matrix we use the $1000 \times 1000$ matrix $\boldsymbol{A}$ coming from a three-dimensional finite difference discretization of the Hamiltonian defined in \ref{['equ:electronic-hamiltonian']} on a $10 \times 10$ grid. We ran the Chebyshev-Nyström++ method with and without estimating if the matrix function is zero before taking its pseudo-inverse. We let $m=2000$, $n_{\boldsymbol{\Omega}}=80$, $n_{\boldsymbol{\Psi}}=0$, and $\sigma = 0.004$. We compute the $L^1$ error from the approximation of the spectral density at $n_t = 100$ uniformly spaced values of $t$ in the interval $[-1, 1]$.
  • Figure 3.3: Singular values $\sigma_i$ of $g_{\sigma}(t\boldsymbol{I}_n - \boldsymbol{A})$ corresponding to eigenvalues $\lambda_{(i)}$ of $\boldsymbol{A}$ which lie further away from $t$ than a certain distance $d$ are negligibly small.
  • Figure 4.1: Cross-sections of the periodic Gaussian wells potential $V$ for different $n_c$.
  • Figure 4.2: Error vs. number of random vectors $n_{\boldsymbol{\Psi}} + n_{\boldsymbol{\Omega}}$ (with fixed $\sigma=0.005$ and $m=2000$) when applying Chebyshev-Nyström++ to the Hamiltonian matrix from \ref{['subsec:hamiltonian']}.
  • ...and 4 more figures

Theorems & Definitions (11)

  • theorem 1
  • theorem 2
  • lemma 1
  • theorem 3
  • lemma 2
  • lemma 3
  • theorem 4
  • lemma 4
  • lemma 5
  • lemma 6
  • ...and 1 more