Table of Contents
Fetching ...

Tuning-free one-bit covariance estimation using data-driven dithering

Sjoerd Dirksen, Johannes Maly

TL;DR

A tuning-free variant of the covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry is introduced, which satisfies the same non-asymptotic error estimates — up to small losses and a slightly worse probability estimate.

Abstract

We consider covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry. Recent work has shown that a reliable estimator can be constructed if uniformly distributed dithers on $[-λ,λ]$ are used in the one-bit quantizer. This estimator enjoys near-minimax optimal, non-asymptotic error estimates in the operator and Frobenius norms if $λ$ is chosen proportional to the largest variance of the distribution. However, this quantity is not known a-priori, and in practice $λ$ needs to be carefully tuned to achieve good performance. In this work we resolve this problem by introducing a tuning-free variant of this estimator, which replaces $λ$ by a data-driven quantity. We prove that this estimator satisfies the same non-asymptotic error estimates - up to small (logarithmic) losses and a slightly worse probability estimate. We also show that by using refined data-driven dithers that vary per entry of each sample, one can construct an estimator satisfying the same estimation error bound as the sample covariance of the samples before quantization -- again up logarithmic losses. Our proofs rely on a new version of the Burkholder-Rosenthal inequalities for matrix martingales, which is expected to be of independent interest.

Tuning-free one-bit covariance estimation using data-driven dithering

TL;DR

A tuning-free variant of the covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry is introduced, which satisfies the same non-asymptotic error estimates — up to small losses and a slightly worse probability estimate.

Abstract

We consider covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry. Recent work has shown that a reliable estimator can be constructed if uniformly distributed dithers on are used in the one-bit quantizer. This estimator enjoys near-minimax optimal, non-asymptotic error estimates in the operator and Frobenius norms if is chosen proportional to the largest variance of the distribution. However, this quantity is not known a-priori, and in practice needs to be carefully tuned to achieve good performance. In this work we resolve this problem by introducing a tuning-free variant of this estimator, which replaces by a data-driven quantity. We prove that this estimator satisfies the same non-asymptotic error estimates - up to small (logarithmic) losses and a slightly worse probability estimate. We also show that by using refined data-driven dithers that vary per entry of each sample, one can construct an estimator satisfying the same estimation error bound as the sample covariance of the samples before quantization -- again up logarithmic losses. Our proofs rely on a new version of the Burkholder-Rosenthal inequalities for matrix martingales, which is expected to be of independent interest.
Paper Structure (16 sections, 19 theorems, 152 equations, 3 figures)

This paper contains 16 sections, 19 theorems, 152 equations, 3 figures.

Key Result

Theorem 1

Let $\mathbf{X}$ be a mean-zero, $K$-subgaussianThe formal definition of a subgaussian random vector is provided in Section sec:Notation below. vector with covariance matrix $\boldsymbol{\Sigma}$. Let $\mathbf{X}_1,...,\mathbf{X}_n \overset{\mathrm{{{i.i.d.}}}}{\sim} \mathbf{X}$. Let $\mathbf{M} \in where $\odot$ denotes the Hadamard (i.e., entry-wise) product. In particular, if $\lambda^2 \simeq_

Figures (3)

  • Figure 1: Comparison of $\boldsymbol{\Sigma}^{\operatorname{adap}}_n$ and $\boldsymbol{\Sigma}^{\operatorname{dith}}_n$ (with optimized $\lambda$). The performance of the sample covariance matrix $\hat{\boldsymbol{\Sigma}}_n = \frac{1}{n} \sum_{k=1}^n \mathbf{X}_k\mathbf{X}_k^T$, which uses unquantized samples, is plotted for reference. It is remarkable that $\boldsymbol{\Sigma}^{\operatorname{adap}}_n$ performs almost as well as the oracle-based estimator $\boldsymbol{\Sigma}^{\operatorname{dith}}_n$ with an optimized $\lambda$.
  • Figure 2: Performance of the two estimators while varying $\lambda$ in $\boldsymbol{\Sigma}^{\operatorname{dith}}_n$. The constant $C_1$ is chosen as in Figure \ref{['fig:Comparison']}. Figure \ref{['fig:VaryingLambda1']} considers the same $\boldsymbol{\Sigma}$ as in Figure \ref{['fig:Comparison']} for $p=5$, whereas Figures \ref{['fig:VaryingLambda2']}-\ref{['fig:VaryingLambda3']} consider two randomly drawn covariance matrices. In all cases, the performance of $\boldsymbol{\Sigma}^{\operatorname{adap}}_n$ is close to the optimal performance of $\boldsymbol{\Sigma}^{\operatorname{dith}}_n$, even though $C_1$ is fixed to the same value. Note that in contrast to Figure \ref{['fig:Comparison']}, the error is measured relatively to $\| \boldsymbol{\Sigma} \|$ to allow easier comparison. The performance of the sample covariance matrix $\hat{\boldsymbol{\Sigma}}_n$ is plotted for reference.
  • Figure 3: Comparison of $\boldsymbol{\Sigma}^{\operatorname{adap}}_n$ and $\hat{\boldsymbol{\Sigma}}^{\operatorname{adap}}_n$. The performance of the sample covariance matrix $\hat{\boldsymbol{\Sigma}}_n = \frac{1}{n} \sum_{k=1}^n \mathbf{X}_k\mathbf{X}_k^T$, which uses unquantized samples, is plotted for reference.

Theorems & Definitions (35)

  • Theorem 1: dirksen2021covariance
  • Remark 2
  • Theorem 3
  • Theorem 4
  • Remark 5
  • Theorem 6
  • Theorem 7
  • Remark 8
  • Lemma 9: dirksen2021covariance
  • Lemma 10
  • ...and 25 more