Table of Contents
Fetching ...

Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices

Chanwoo Chun, SueYeon Chung, Daniel D. Lee

TL;DR

The paper tackles the challenge of inferring the spectrum of the kernel integral operator from finitely sampled measurement matrices, where both inputs and features are limited, by introducing an unbiased, dynamic-programming–based estimator for spectral moments $m(n)=\mathrm{tr}T_k^n$. It shows how to construct unbiased moment estimates from cyclic products of matrix entries, derives a scalable DP algorithm with a provable variance bound, and demonstrates robustness to noise. The authors validate the method on RBF kernels, derive analytic spectra, and successfully reconstruct eigenvalues from moments; they also illustrate the approach's utility in analyzing neural feature representations during training. This unbiased moment-based framework enables more accurate geometric and learning-dynamics insights for kernel methods and wide neural networks, with practical implications for kernel approximation and spectral analysis in large-scale models.

Abstract

Analyzing the structure of sampled features from an input data distribution is challenging when constrained by limited measurements in both the number of inputs and features. Traditional approaches often rely on the eigenvalue spectrum of the sample covariance matrix derived from finite measurement matrices; however, these spectra are sensitive to the size of the measurement matrix, leading to biased insights. In this paper, we introduce a novel algorithm that provides unbiased estimates of the spectral moments of the kernel integral operator in the limit of infinite inputs and features from finitely sampled measurement matrices. Our method, based on dynamic programming, is efficient and capable of estimating the moments of the operator spectrum. We demonstrate the accuracy of our estimator on radial basis function (RBF) kernels, highlighting its consistency with the theoretical spectra. Furthermore, we showcase the practical utility and robustness of our method in understanding the geometry of learned representations in neural networks.

Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices

TL;DR

The paper tackles the challenge of inferring the spectrum of the kernel integral operator from finitely sampled measurement matrices, where both inputs and features are limited, by introducing an unbiased, dynamic-programming–based estimator for spectral moments . It shows how to construct unbiased moment estimates from cyclic products of matrix entries, derives a scalable DP algorithm with a provable variance bound, and demonstrates robustness to noise. The authors validate the method on RBF kernels, derive analytic spectra, and successfully reconstruct eigenvalues from moments; they also illustrate the approach's utility in analyzing neural feature representations during training. This unbiased moment-based framework enables more accurate geometric and learning-dynamics insights for kernel methods and wide neural networks, with practical implications for kernel approximation and spectral analysis in large-scale models.

Abstract

Analyzing the structure of sampled features from an input data distribution is challenging when constrained by limited measurements in both the number of inputs and features. Traditional approaches often rely on the eigenvalue spectrum of the sample covariance matrix derived from finite measurement matrices; however, these spectra are sensitive to the size of the measurement matrix, leading to biased insights. In this paper, we introduce a novel algorithm that provides unbiased estimates of the spectral moments of the kernel integral operator in the limit of infinite inputs and features from finitely sampled measurement matrices. Our method, based on dynamic programming, is efficient and capable of estimating the moments of the operator spectrum. We demonstrate the accuracy of our estimator on radial basis function (RBF) kernels, highlighting its consistency with the theoretical spectra. Furthermore, we showcase the practical utility and robustness of our method in understanding the geometry of learned representations in neural networks.

Paper Structure

This paper contains 35 sections, 4 theorems, 109 equations, 8 figures, 3 algorithms.

Key Result

Theorem 1

Suppose $\phi\in\mathcal{L}^4(\rho_\mathcal{X}\otimes \rho_\mathcal{W})$. Then variance of $\hat{m}(n)$ satisfies where

Figures (8)

  • Figure 1: Visual illustration of the calculation of the unbiased estimator. a. For computing $\hat{m}^*(3)$, one can select matrix entries such that the entries create a cyclic path of 6 turns without revisiting rows and columns more than twice, and average over all possible such paths. b. Our method limits the cyclic paths to only increasing indices. c. Example paths for $\hat{m}(2),\ldots,\hat{m}(6)$.
  • Figure 2: Estimated RBF moments for $d=5$, $\Sigma_x=I_{d\times d}$, $\Sigma=0.25I_{d\times d}$. Our estimator $\hat{m}$ is labeled as "Ours", the two versions of Kong and Valiant estimators $\hat{m}_{\text{KV-row}}$ and $\hat{m}_{\text{KV-col}}$ are labeled as "KV-row" and "KV-col" respectively, the naive estimator $\hat{m}_0$ is labeled as "naive", and the analytic ground truth moments $m$ are labeled as "GT". a.$P=300$ and $Q=600$. Left: The $\hat{m}(n)$ values for various estimators, with $n$ ranging from 2 to 7. Right: The MSE between $\hat{m}(n)$ and $m(n)$. b. The same as a., but for $\hat{m}^{1/n}(n)$. The black arrow indicates the value of the operator norm. c.$P$ is fixed to 300, and $Q$ is varied. d.$Q$ is fixed to 600, and $P$ is varied. Bars indicate a $50\%$ confidence interval.
  • Figure 3: The reconstructed eigenvalues of two generative processes. "GT" refers to ground truth eigenvalues. "SVD" is the singular values of the empirical Gram matrix. "KV" is the eigenvalues reconstructed from $\left\{\hat{m}_\text{KV}(n)\right\}_{n=1}^{10}$. "Ours" is the eigenvalues reconstructed from $\left\{\hat{m}(n)\right\}_{n=1}^{10}$. Left: finite-rank linear generative process whose true eigenvalue is 0.3 with multiplicity 20. $P=Q=100$. Right: Random Fourier feature generative process whose true $i$th eigenvalue is $\left(\eta \varphi_\eta\right)^{-i}$ with $\eta=400$. $P=Q=20$.
  • Figure 4: The estimated spectral moments during training of single hidden layer ReLU neural networks. Top row: networks of different widths have dramatically different naive estimates $\hat{m}_0(n)$ of the operator moments. Bottom row: estimates using the unbiased estimator $\hat{m}(n)$ is similar across all widths. Results were obtained from networks trained from 29 random initializations. Shades indicate a $50\%$ confidence interval.
  • Figure A1: Performance of the estimators with the RBF kernel. Columns from left to right: the estimated moments; the mean-square error between the estimated moments and the ground true moments averaged over multiple samples of $\Phi$'s ($\left<\hat{m}(n)-m(n)\right>_\Phi$); bias error ($\left<\hat{m}(n)\right> - m(n)$); variance error ($\left<\hat{m}(n)^2\right> - \left<\hat{m}(n)\right>^2$). a.$P=300$, $Q=600$, $d=5$, $\Sigma_x=I_{d\times d}$, $\Sigma=0.25 I_{d\times d}$. b.$P=300$, $Q=600$, $d=10$, $\Sigma_x=I_{d\times d}$, $\Sigma=I_{d\times d}$. c.$P=30$, $Q=60$, $d=10$, $\Sigma_x=I_{d\times d}$, $\Sigma=4 I_{d\times d}$. d.$P=30$, $Q=60$, $d=4$, $\Sigma_x=I_{d\times d}$, $\Sigma=0.25 I_{d\times d}$.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Theorem 2
  • proof