Tuning-free one-bit covariance estimation using data-driven dithering

Sjoerd Dirksen; Johannes Maly

Tuning-free one-bit covariance estimation using data-driven dithering

Sjoerd Dirksen, Johannes Maly

TL;DR

A tuning-free variant of the covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry is introduced, which satisfies the same non-asymptotic error estimates — up to small losses and a slightly worse probability estimate.

Abstract

We consider covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry. Recent work has shown that a reliable estimator can be constructed if uniformly distributed dithers on $[-λ,λ]$ are used in the one-bit quantizer. This estimator enjoys near-minimax optimal, non-asymptotic error estimates in the operator and Frobenius norms if $λ$ is chosen proportional to the largest variance of the distribution. However, this quantity is not known a-priori, and in practice $λ$ needs to be carefully tuned to achieve good performance. In this work we resolve this problem by introducing a tuning-free variant of this estimator, which replaces $λ$ by a data-driven quantity. We prove that this estimator satisfies the same non-asymptotic error estimates - up to small (logarithmic) losses and a slightly worse probability estimate. We also show that by using refined data-driven dithers that vary per entry of each sample, one can construct an estimator satisfying the same estimation error bound as the sample covariance of the samples before quantization -- again up logarithmic losses. Our proofs rely on a new version of the Burkholder-Rosenthal inequalities for matrix martingales, which is expected to be of independent interest.

Tuning-free one-bit covariance estimation using data-driven dithering

TL;DR

Abstract

are used in the one-bit quantizer. This estimator enjoys near-minimax optimal, non-asymptotic error estimates in the operator and Frobenius norms if

is chosen proportional to the largest variance of the distribution. However, this quantity is not known a-priori, and in practice

needs to be carefully tuned to achieve good performance. In this work we resolve this problem by introducing a tuning-free variant of this estimator, which replaces

by a data-driven quantity. We prove that this estimator satisfies the same non-asymptotic error estimates - up to small (logarithmic) losses and a slightly worse probability estimate. We also show that by using refined data-driven dithers that vary per entry of each sample, one can construct an estimator satisfying the same estimation error bound as the sample covariance of the samples before quantization -- again up logarithmic losses. Our proofs rely on a new version of the Burkholder-Rosenthal inequalities for matrix martingales, which is expected to be of independent interest.

Paper Structure (16 sections, 19 theorems, 152 equations, 3 figures)

This paper contains 16 sections, 19 theorems, 152 equations, 3 figures.

Introduction
Notation
Organization
Main results
Data adaptive dithering
Entry-wise dithering scales
Numerical simulation
Related work
Quantized covariance estimation
Matrix martingale inequalities
Proofs
Control of the bias
Proof of Theorem \ref{['thm:FrobeniusDitheredMask']}
Proof of Theorem \ref{['thm:OperatorDitheredMask']}
Proof of Theorem \ref{['thm:matrixBRintro']}
...and 1 more sections

Key Result

Theorem 1

Let $\mathbf{X}$ be a mean-zero, $K$-subgaussianThe formal definition of a subgaussian random vector is provided in Section sec:Notation below. vector with covariance matrix $\boldsymbol{\Sigma}$. Let $\mathbf{X}_1,...,\mathbf{X}_n \overset{\mathrm{{{i.i.d.}}}}{\sim} \mathbf{X}$. Let $\mathbf{M} \in where $\odot$ denotes the Hadamard (i.e., entry-wise) product. In particular, if $\lambda^2 \simeq_

Figures (3)

Figure 1: Comparison of $\boldsymbol{\Sigma}^{\operatorname{adap}}_n$ and $\boldsymbol{\Sigma}^{\operatorname{dith}}_n$ (with optimized $\lambda$). The performance of the sample covariance matrix $\hat{\boldsymbol{\Sigma}}_n = \frac{1}{n} \sum_{k=1}^n \mathbf{X}_k\mathbf{X}_k^T$, which uses unquantized samples, is plotted for reference. It is remarkable that $\boldsymbol{\Sigma}^{\operatorname{adap}}_n$ performs almost as well as the oracle-based estimator $\boldsymbol{\Sigma}^{\operatorname{dith}}_n$ with an optimized $\lambda$.
Figure 2: Performance of the two estimators while varying $\lambda$ in $\boldsymbol{\Sigma}^{\operatorname{dith}}_n$. The constant $C_1$ is chosen as in Figure \ref{['fig:Comparison']}. Figure \ref{['fig:VaryingLambda1']} considers the same $\boldsymbol{\Sigma}$ as in Figure \ref{['fig:Comparison']} for $p=5$, whereas Figures \ref{['fig:VaryingLambda2']}-\ref{['fig:VaryingLambda3']} consider two randomly drawn covariance matrices. In all cases, the performance of $\boldsymbol{\Sigma}^{\operatorname{adap}}_n$ is close to the optimal performance of $\boldsymbol{\Sigma}^{\operatorname{dith}}_n$, even though $C_1$ is fixed to the same value. Note that in contrast to Figure \ref{['fig:Comparison']}, the error is measured relatively to $\| \boldsymbol{\Sigma} \|$ to allow easier comparison. The performance of the sample covariance matrix $\hat{\boldsymbol{\Sigma}}_n$ is plotted for reference.
Figure 3: Comparison of $\boldsymbol{\Sigma}^{\operatorname{adap}}_n$ and $\hat{\boldsymbol{\Sigma}}^{\operatorname{adap}}_n$. The performance of the sample covariance matrix $\hat{\boldsymbol{\Sigma}}_n = \frac{1}{n} \sum_{k=1}^n \mathbf{X}_k\mathbf{X}_k^T$, which uses unquantized samples, is plotted for reference.

Theorems & Definitions (35)

Theorem 1: dirksen2021covariance
Remark 2
Theorem 3
Theorem 4
Remark 5
Theorem 6
Theorem 7
Remark 8
Lemma 9: dirksen2021covariance
Lemma 10
...and 25 more

Tuning-free one-bit covariance estimation using data-driven dithering

TL;DR

Abstract

Tuning-free one-bit covariance estimation using data-driven dithering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (35)