Anisotropic local law for non-separable sample covariance matrices
Zhou Fan, Renyuan Ma, Elliot Paquette, Zhichao Wang
TL;DR
This work extends local spectral laws for sample covariance matrices to non-separable data, proving an optimal averaged local law under a quadratic-concentration condition and a full anisotropic local law under a structured cumulant-tensor assumption. The authors introduce a tensor-network framework to manage high-order cumulants and fluctuation averaging, enabling entrywise, directional, and out-of-spectrum control of the resolvents. They verify the framework across non-separable models including conditionally mean-zero distributions and random features settings, and discuss limitations via negative examples. Consequences include eigenvalue rigidity and eigenvector delocalization at the optimal scale, with implications for covariance estimation and kernel-like learning architectures where nonlinearity and dependence are present.
Abstract
We establish local laws for sample covariance matrices $K = N^{-1}\sum_{i=1}^N \g_i\g_i^*$ where the random vectors $\g_1, \ldots, \g_N \in \R^n$ are independent with common covariance $Σ$. Previous work has largely focused on the separable model $\g = Σ^{1/2}\w$ with $\w$ having independent entries, but this structure is rarely present in statistical applications involving dependent or nonlinearly transformed data. Under a concentration assumption for quadratic forms $\g^*A\g$, we prove an optimal averaged local law showing that the Stieltjes transform of $K$ converges to its deterministic limit uniformly down to the optimal scale $η\geq N^{-1+\eps}$. Under an additional structural assumption on the cumulant tensors of $\g$ -- which interpolates between the highly structured case of independent entries and generic dependence -- we establish the full anisotropic local law, providing entrywise control of the resolvent $(K-zI)^{-1}$ in arbitrary directions. We discuss several classes of non-separable examples satisfying our assumptions, including conditionally mean-zero distributions, the random features model $\g = σ(X\w)$ arising in machine learning, and Gaussian measures with nonlinear tilting. The proofs introduce a tensor network framework for analyzing fluctuation averaging in the presence of higher-order cumulant structure.
