Table of Contents
Fetching ...

Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements

Arya Mazumdar, Neha Sangwan

TL;DR

This paper addresses the problem of exact recovery of a $k$-sparse binary vector from generalized linear measurements, including one-bit compressed sensing and logistic regression. It analyzes the Plan–Vershynin–Yudovina linear-estimation algorithm within GLMs, and establishes information-theoretic lower bounds on the required number of measurements. The authors prove that for 1bCSbinary and logistic regression, a sample complexity of $m = O((k+\sigma^2)\log n)$ is both achievable and necessary, and they obtain tight results for SparseLinearReg via ML-based upper and lower bounds. A key outcome is that there is no statistical-computational gap for the considered GLMs, contrasting with conjectures in the binary linear regression literature. The results unify understanding of sample complexity across GLMs for sparse binary signals and provide computationally efficient recovery guarantees with tight thresholds in both noisy and noiseless regimes.

Abstract

We consider the problem of exact recovery of a $k$-sparse binary vector from generalized linear measurements (such as logistic regression). We analyze the linear estimation algorithm (Plan, Vershynin, Yudovina, 2017), and also show information theoretic lower bounds on the number of required measurements. As a consequence of our results, for noisy one bit quantized linear measurements ($\mathsf{1bCSbinary}$), we obtain a sample complexity of $O((k+σ^2)\log{n})$, where $σ^2$ is the noise variance. This is shown to be optimal due to the information theoretic lower bound. We also obtain tight sample complexity characterization for logistic regression. Since $\mathsf{1bCSbinary}$ is a strictly harder problem than noisy linear measurements ($\mathsf{SparseLinearReg}$) because of added quantization, the same sample complexity is achievable for $\mathsf{SparseLinearReg}$. While this sample complexity can be obtained via the popular lasso algorithm, linear estimation is computationally more efficient. Our lower bound holds for any set of measurements for $\mathsf{SparseLinearReg}$, (similar bound was known for Gaussian measurement matrices) and is closely matched by the maximum-likelihood upper bound. For $\mathsf{SparseLinearReg}$, it was conjectured in Gamarnik and Zadik, 2017 that there is a statistical-computational gap and the number of measurements should be at least $(2k+σ^2)\log{n}$ for efficient algorithms to exist. It is worth noting that our results imply that there is no such statistical-computational gap for $\mathsf{1bCSbinary}$ and logistic regression.

Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements

TL;DR

This paper addresses the problem of exact recovery of a -sparse binary vector from generalized linear measurements, including one-bit compressed sensing and logistic regression. It analyzes the Plan–Vershynin–Yudovina linear-estimation algorithm within GLMs, and establishes information-theoretic lower bounds on the required number of measurements. The authors prove that for 1bCSbinary and logistic regression, a sample complexity of is both achievable and necessary, and they obtain tight results for SparseLinearReg via ML-based upper and lower bounds. A key outcome is that there is no statistical-computational gap for the considered GLMs, contrasting with conjectures in the binary linear regression literature. The results unify understanding of sample complexity across GLMs for sparse binary signals and provide computationally efficient recovery guarantees with tight thresholds in both noisy and noiseless regimes.

Abstract

We consider the problem of exact recovery of a -sparse binary vector from generalized linear measurements (such as logistic regression). We analyze the linear estimation algorithm (Plan, Vershynin, Yudovina, 2017), and also show information theoretic lower bounds on the number of required measurements. As a consequence of our results, for noisy one bit quantized linear measurements (), we obtain a sample complexity of , where is the noise variance. This is shown to be optimal due to the information theoretic lower bound. We also obtain tight sample complexity characterization for logistic regression. Since is a strictly harder problem than noisy linear measurements () because of added quantization, the same sample complexity is achievable for . While this sample complexity can be obtained via the popular lasso algorithm, linear estimation is computationally more efficient. Our lower bound holds for any set of measurements for , (similar bound was known for Gaussian measurement matrices) and is closely matched by the maximum-likelihood upper bound. For , it was conjectured in Gamarnik and Zadik, 2017 that there is a statistical-computational gap and the number of measurements should be at least for efficient algorithms to exist. It is worth noting that our results imply that there is no such statistical-computational gap for and logistic regression.

Paper Structure

This paper contains 21 sections, 10 theorems, 95 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1

Suppose the GLM is such that for each $i\in [m]$, $y_i$ is a subgaussian random variable with subgaussian norm given by $\left\lVert{y_i}\right\rVert_{\psi_2}$. For any $\mathbf{x}$, suppose for some $L$, $\bbE\left[g'(\mathbf{A}_i^T\mathbf{x})\right]\geq L\cdot \left\lVert{y_i}\right\rVert_{\psi_2} where $C$ is some constant.

Figures (1)

  • Figure 1: The figure shows the plot of the MLE upper bound \ref{['eq:upper_bd_mle']} (given by m1) for different values of $k$. This is displayed in blue color. A plot of $\frac{2nN(l)}{\log\left(\frac{ l}{2\sigma^2}+1\right)}$ is also presented for $l = k\left(1-\frac{k}{n}\right)$ in orange color, given by m2. A part of the plot is zoomed in to emphasize the closeness between the lines. In these plots, $\sigma^2$ is set to 1, $n$ is 50000 and $k$ ranges from 1000 to 25000 $(n/2)$.

Theorems & Definitions (21)

  • Remark 1
  • Theorem 1: Sample Complexity of Algorithm \ref{['alg:1']} for GLMs
  • Corollary 2: Sample Complexity of Algorithm \ref{['alg:1']} for $\mathsf{1bCSbinary}$
  • Corollary 3: Sample Complexity of Algorithm \ref{['alg:1']} for $\mathsf{SparseLinearReg}$
  • Corollary 4: Sample Complexity of Algorithm \ref{['alg:1']} for $\mathsf{LogisticRegression}$
  • Theorem 5: Lower bound for GLMs
  • Corollary 6: $\mathsf{1bCSbinary}$ lower bound
  • Corollary 7: $\mathsf{LogisticRegression}$ lower bound
  • Corollary 8: $\mathsf{SparseLinearReg}$ lower bound
  • Theorem 9: MLE upper bound for $\mathsf{SparseLinearReg}$
  • ...and 11 more