Table of Contents
Fetching ...

A fast and slightly robust covariance estimator

John Duchi, Saminul Haque, Rohith Kuditipudi

Abstract

Let $\mathcal{Z} = \{Z_1, \dots, Z_n\} \stackrel{\mathrm{i.i.d.}}{\sim} P \subset \mathbb{R}^d$ from a distribution $P$ with mean zero and covariance $Σ$. Given a dataset $\mathcal{X}$ such that $d_{\mathrm{ham}}(\mathcal{X}, \mathcal{Z}) \leq \varepsilon n$, we are interested in finding an efficient estimator $\widehatΣ$ that achieves $\mathrm{err}(\widehatΣ, Σ) := \|Σ^{-\frac{1}{2}}\widehatΣΣ^{-\frac{1}{2}} - I\| _{\mathrm{op}} \leq 1/2$. We focus on the low contamination regime $\varepsilon = o(1/\sqrt{d}$). In this regime, prior work required either $Ω(d^{3/2})$ samples or runtime that is exponential in $d$. We present an algorithm that, for subgaussian data, has near-linear sample complexity $n = \widetildeΩ(d)$ and runtime $O((n+d)^{ω+ \frac{1}{2}})$, where $ω$ is the matrix multiplication exponent. We also show that this algorithm works for heavy-tailed data with near-linear sample complexity, but in a smaller regime of $\varepsilon$. Concurrent to our work, Diakonikolas et al. [2024] give Sum-of-Squares estimators that achieve similar sample complexity but with large polynomial runtime.

A fast and slightly robust covariance estimator

Abstract

Let from a distribution with mean zero and covariance . Given a dataset such that , we are interested in finding an efficient estimator that achieves . We focus on the low contamination regime ). In this regime, prior work required either samples or runtime that is exponential in . We present an algorithm that, for subgaussian data, has near-linear sample complexity and runtime , where is the matrix multiplication exponent. We also show that this algorithm works for heavy-tailed data with near-linear sample complexity, but in a smaller regime of . Concurrent to our work, Diakonikolas et al. [2024] give Sum-of-Squares estimators that achieve similar sample complexity but with large polynomial runtime.

Paper Structure

This paper contains 16 sections, 26 theorems, 139 equations.

Key Result

Lemma A.1

Let $A, B$ be p.s.d. matrices such that $d_{\textup{psd}}(A, B) \leq c < 1$. Then,

Theorems & Definitions (29)

  • Lemma A.1: Properties of $d_{\textup{psd}}$
  • Definition A.1: Subgaussian random variable
  • Definition A.2: Sub-exponential random variable
  • Lemma A.2: Tail bounds of subgaussians Vershynin12
  • Lemma A.3: Concentration of empirical covariance Vershynin12
  • Lemma A.4: Bernstein-type inequality Vershynin12
  • Lemma A.5: Square of subgaussian is sub-exponential Vershynin12
  • Lemma A.6: Subgaussian $\ell_2$-norm boundedness
  • Definition A.3: Moment bounded random variable
  • Lemma A.7
  • ...and 19 more