Nearly Optimal Robust Covariance and Scatter Matrix Estimation Beyond Gaussians
Gleb Novikov
TL;DR
This work presents the first polynomial-time, nearly optimal robust scatter/covariance estimation for elliptical distributions beyond Gaussians under strong contamination in high dimensions. It reduces covariance estimation to robust learning of the spatial sign distribution and introduces a novel spectral covariance filtering step that leverages degree-4 sum-of-squares relaxations to bound high-order effects. The main result delivers a bound of $\|\Sigma^{-1/2}\hat\Sigma\Sigma^{-1/2}-Id\| \le O(\varepsilon\log(1/\varepsilon))$ with $n = \tilde{O}(d^2/\varepsilon^2)$ samples, under mild conditions like $\mathrm{erk}(\Sigma) \ge C\log d$, and extends nearly Gaussian guarantees to elliptical families with Hanson–Wright or sub-exponential tails. The approach yields practical implications for robust PCA and robust covariance estimation, providing dimension-independent error guarantees and a framework (including spectral filtering) that may be of independent interest in high-dimensional robust statistics.
Abstract
We study the problem of computationally efficient robust estimation of the covariance/scatter matrix of elliptical distributions -- that is, affine transformations of spherically symmetric distributions -- under the strong contamination model in the high-dimensional regime $d \gtrsim 1/\varepsilon^2$, where $d$ is the dimension and $\varepsilon$ is the fraction of adversarial corruptions. We propose an algorithm that, under a very mild assumption on the scatter matrix $Σ$, and given a nearly optimal number of samples $n = \tilde{O}(d^2/\varepsilon^2)$, computes in polynomial time an estimator $\hatΣ$ such that, with high probability, \[ \left\| Σ^{-1/2} \hatΣ Σ^{-1/2} - Id \right\|_{\text F} \le O(\varepsilon \log(1/\varepsilon))\,. \] As an application of our result, we obtain the first efficiently computable, nearly optimal robust covariance estimators that extend beyond the Gaussian case. Specifically, for elliptical distributions satisfying the Hanson--Wright inequality (such as Gaussians and uniform distributions over ellipsoids), our estimator $\hatΣ$ of the covariance $Σ$ achieves the same error guarantee as in the Gaussian case. Moreover, for elliptical distributions with sub-exponential tails (such as the multivariate Laplace distribution), we construct an estimator $\hatΣ$ satisfying the spectral norm bound \[ \left\| Σ^{-1/2} \hatΣ Σ^{-1/2} - Id \right\| \le O(\varepsilon \log(1/\varepsilon))\,. \] Our approach is based on estimating the covariance of the spatial sign of elliptical distributions. The estimation proceeds in several stages, one of which involves a novel spectral covariance filtering algorithm. This algorithm combines covariance filtering techniques with degree-4 sum-of-squares relaxations, and we believe it may be of independent interest for future applications.
