Table of Contents
Fetching ...

A Parameter-Free Two-Bit Covariance Estimator with Improved Operator Norm Error Rate

Junren Chen, Michael K. Ng

TL;DR

This work proposes a new 2-bit covariance matrix estimator that simultaneously addresses both theoretical and practical issues and eliminates the need of any tuning parameter, as the dithering scales are entirely determined by the data.

Abstract

A covariance matrix estimator using two bits per entry was recently developed by Dirksen, Maly and Rauhut [Annals of Statistics, 50(6), pp. 3538-3562]. The estimator achieves near minimax rate for general sub-Gaussian distributions, but also suffers from two downsides: theoretically, there is an essential gap on operator norm error between their estimator and sample covariance when the diagonal of the covariance matrix is dominated by only a few entries; practically, its performance heavily relies on the dithering scale, which needs to be tuned according to some unknown parameters. In this work, we propose a new 2-bit covariance matrix estimator that simultaneously addresses both issues. Unlike the sign quantizer associated with uniform dither in Dirksen et al., we adopt a triangular dither prior to a 2-bit quantizer inspired by the multi-bit uniform quantizer. By employing dithering scales varying across entries, our estimator enjoys an improved operator norm error rate that depends on the effective rank of the underlying covariance matrix rather than the ambient dimension, thus closing the theoretical gap. Moreover, our proposed method eliminates the need of any tuning parameter, as the dithering scales are entirely determined by the data. Experimental results under Gaussian samples are provided to showcase the impressive numerical performance of our estimator. Remarkably, by halving the dithering scales, our estimator oftentimes achieves operator norm errors less than twice of the errors of sample covariance.

A Parameter-Free Two-Bit Covariance Estimator with Improved Operator Norm Error Rate

TL;DR

This work proposes a new 2-bit covariance matrix estimator that simultaneously addresses both theoretical and practical issues and eliminates the need of any tuning parameter, as the dithering scales are entirely determined by the data.

Abstract

A covariance matrix estimator using two bits per entry was recently developed by Dirksen, Maly and Rauhut [Annals of Statistics, 50(6), pp. 3538-3562]. The estimator achieves near minimax rate for general sub-Gaussian distributions, but also suffers from two downsides: theoretically, there is an essential gap on operator norm error between their estimator and sample covariance when the diagonal of the covariance matrix is dominated by only a few entries; practically, its performance heavily relies on the dithering scale, which needs to be tuned according to some unknown parameters. In this work, we propose a new 2-bit covariance matrix estimator that simultaneously addresses both issues. Unlike the sign quantizer associated with uniform dither in Dirksen et al., we adopt a triangular dither prior to a 2-bit quantizer inspired by the multi-bit uniform quantizer. By employing dithering scales varying across entries, our estimator enjoys an improved operator norm error rate that depends on the effective rank of the underlying covariance matrix rather than the ambient dimension, thus closing the theoretical gap. Moreover, our proposed method eliminates the need of any tuning parameter, as the dithering scales are entirely determined by the data. Experimental results under Gaussian samples are provided to showcase the impressive numerical performance of our estimator. Remarkably, by halving the dithering scales, our estimator oftentimes achieves operator norm errors less than twice of the errors of sample covariance.
Paper Structure (19 sections, 10 theorems, 101 equations, 4 figures)

This paper contains 19 sections, 10 theorems, 101 equations, 4 figures.

Key Result

Theorem 1

Suppose that $\bm{X}_1,...,\bm{X}_n$ are i.i.d. copies of the zero-mean $K$-sub-Gaussian random vector $\bm{X}\in \mathbb{R}^d$, we consider the 2-bit estimator $\bm{\widehat{\Sigma}}_{na}$ in (aosesti) with uniform dithers $\bm{\tau}_{i1},\bm{\tau}_{i2}\sim \mathscr{U}[-1,1]^d$. If $\lambda^2= C(K) Setting $t = 10\log(nd)$ yields that, with probability at least $1-2(nd)^{-10}$ that Moreover, und

Figures (4)

  • Figure 1: The curves of "operator norm error v.s. $C$", with the optimal $C$ and the corresponding minimum error reported in the labels. We simulate $\bm{\Sigma} = \bm{\Sigma}(1,0.2,1)$ in (a), $\bm{\Sigma}=\bm{\Sigma}(1,0.9,1)$ in (b), and $\bm{\Sigma}=\bm{\Sigma}(1,0.2,10)$ in (c).
  • Figure 2: Graphical illustration of the multi-bit quantizer $\mathcal{Q}_{\lambda}(\cdot)$, the 2-bit quantizer $\mathcal{Q}_{\lambda,\mathop{\mathrm{2b}}\nolimits}(\cdot)$ and the 1-bit quantizer $\frac{\lambda}{2}\mathop{\mathrm{sign}}\nolimits(\cdot)$, under resolution $\lambda=1$.
  • Figure 3: The curves of "operator norm error v.s. $n$" and "operator norm error v.s. $d$". We simulate $\bm{\Sigma} = \bm{\Sigma}(1,0.2,1)$ in (a) an (d), $\bm{\Sigma}=\bm{\Sigma}(1,0.2,10)$ in (b) and (e), and $\bm{\Sigma}=\bm{\Sigma}(1,0.2,25)$ in (c) and (f).
  • Figure 4: Comparing the "online" estimator $\widetilde{\bm{\Sigma}}_{on}$ and estimators $\{\widetilde{\bm{\Sigma}}(s):s=0.5,0.7,0.9\}$ under Gaussian samples with covariance matrix $\bm{\Sigma}=\bm{\Sigma}(1,0.2,25)$. Let us use $n_0 =\lceil C_1(K)\log (n)\rceil$ and $\tilde{\lambda}_j =C(K)[(\frac{1}{n_0}\sum_{i=1}^{n_0}X_{ij}^2)\log (n)]^{1/2}$ for $\widetilde{\bm{\Sigma}}_{on}$. We simply fix $C_1(K)=3$ and notice that we only collect very few samples for determining $\tilde{\lambda}_j$, e.g., we use $n_0 = \lceil 3\log(1600)\rceil=23$ when $n=1600$. Regarding $C(K)$, we find that $C(K)=0.7$ is empirically near-optimal under $n=500$. In the simulation, we test such near-optimal choice $C(K)=0.7$ and also the sub-optimal choices $C(K)=0.5$ and $C(K)=1$, with the corresponding estimators denoted by $\widetilde{\bm{\Sigma}}_{on}(0.7)$, $\widetilde{\bm{\Sigma}}_{on}(0.5)$ and $\widetilde{\bm{\Sigma}}_{on}(1)$, respectively. We find that $\widetilde{\bm{\Sigma}}_{on}$ with near-optimal $C(K)$ performs comparably to $\widetilde{\bm{\Sigma}}(0.5)$.

Theorems & Definitions (21)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • proof
  • Lemma 1
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • ...and 11 more