Table of Contents
Fetching ...

Linear Regression from 1-bit Quantized Data

Daniel Hill, Martin Slawski

Abstract

Motivated by the prevalence of environments in which data is abundant while resources for storage and/or transmission might be scarce, we study linear regression when predictors, their squares, and responses are subject to single-bit dithered quantization. An estimator relying on plug-in estimation of the quadratic and linear terms in the quadratic program formulation of the least squares problem is proposed. We provide a non-asymptotic bound on the $\ell_2$-estimation error of this estimator and obtain its asymptotic distribution when the number of predictors is fixed, which can be used for inference and an investigation of the mean-square error efficiency relative to the ordinary least squares estimator. It is shown that for the quantization protocol under consideration, substantial improvements over the proposed estimator cannot be expected. A compression pipeline in which the underlying data is first subject to sketching and subsequently quantization can be studied within our framework as well. We also present an extension to address high-dimensional predictors. Numerical experiments with synthetic data complement our theoretical findings.

Linear Regression from 1-bit Quantized Data

Abstract

Motivated by the prevalence of environments in which data is abundant while resources for storage and/or transmission might be scarce, we study linear regression when predictors, their squares, and responses are subject to single-bit dithered quantization. An estimator relying on plug-in estimation of the quadratic and linear terms in the quadratic program formulation of the least squares problem is proposed. We provide a non-asymptotic bound on the -estimation error of this estimator and obtain its asymptotic distribution when the number of predictors is fixed, which can be used for inference and an investigation of the mean-square error efficiency relative to the ordinary least squares estimator. It is shown that for the quantization protocol under consideration, substantial improvements over the proposed estimator cannot be expected. A compression pipeline in which the underlying data is first subject to sketching and subsequently quantization can be studied within our framework as well. We also present an extension to address high-dimensional predictors. Numerical experiments with synthetic data complement our theoretical findings.

Paper Structure

This paper contains 23 sections, 14 theorems, 123 equations, 5 figures, 2 tables.

Key Result

Proposition 1

Let (SG-$X$) and (SG-$\epsilon$) be satisfied. Let $K = \max_{1 \leq j \leq d} \lVert X_{1j} \rVert_{\psi_2}$, $\overline{K} = \lVert\Sigma^{-1/2} X_1 \rVert_{\psi_2}$, and $K_{\epsilon} = \lVert\epsilon_1 \rVert_{\psi_2}$. Consider the events Then $\mathop{\mathrm{\mathbf{P}}}\nolimits(\mathcal{R}) \geq 1 - 2/(n \cdot d)$ and $\mathop{\mathrm{\mathbf{P}}}\nolimits(\mathcal{L}) \geq 1 - 4/n$

Figures (5)

  • Figure 1: Illustration of the quantization method for a random variable $Z$ supported on an interval $[\ell, u]$. Conditional on $\{Z = z = \frac{1}{4} \ell + \frac{3}{4} u \}$, the quantizer outputs $u$ with probability $3/4$ and $\ell$ with probability $1/4$ so that (conditional) unbiasedness holds, i.e., $\mathop{\mathrm{\mathbf{E}}}\nolimits[Q_Z(z)| Z = z] = z$.
  • Figure 2: Schematic summary of the sketching followed by quantization pipeline described in the text. In the first step, the $X$'s and $Y$'s are sketched via a rescaled sketching matrix $\mathbf S$. In the second step, quantization is performed on the sketched data.
  • Figure 3: Top: MSEs for estimating $\beta_*$ based on quantized and unquantized ("plain") data, including an adjustment ("plain $\times$ 32") to produce a comparison at the "error per bit" level. The plots show means and $\pm 2\times$ standard error bars over 1k independent replications for each value of $\sigma$ (horizontal axis) and two scenarios -- (L) Gaussian $X$ and (R) uniformly distributed $X$. Bottom: MSE for estimating $\beta_*$ based on sketching + quantization under the first scenario (Gaussian $X$) and $\sigma = 1$ for varying sketch size $m$ (horizontal axis); the dashed straight line represents the least squares regression fit of $\log_{10} \text{MSE}$ on $\log m$ when fixing the slope to $m = -1$, while the dotted line indicates the MSE with quantization only (not to scale).
  • Figure 4: Example of low-bandwidth regression using compressed sufficient statistics.
  • Figure 5: Normal Q-Q plots of the centered and scaled regression coefficient estimates in the low-dimensional setting (left) and in the sparse, moderate dimensional setting when using $\ell_1$-penalization.

Theorems & Definitions (15)

  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Proposition 3
  • Lemma 1
  • proof : Proof
  • Lemma A.1
  • ...and 5 more