Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

Jonathan Kelner; Frederic Koehler; Raghu Meka; Dhruv Rohatgi

Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

Jonathan Kelner, Frederic Koehler, Raghu Meka, Dhruv Rohatgi

TL;DR

This work tackles the computational-statistical gap in sparse linear regression when covariates are highly correlated due to latent structure or outliers. It introduces a data-driven adaptive rescaling procedure (smart scaling) that, under the notion of (α,h)-rescalability, enables a rescaled Lasso estimator to achieve near-information-theoretic performance with polynomial-time complexity; notably, in latent-variable settings α=1 and h equals the latent-dimension, while in outlier settings α scales with spectral gaps and h with outlier count. The paper also provides strong evidence that, under a low-degree polynomial hypothesis, a quadratic-in-sparsity barrier in sample complexity (O(k^2 log n)) may be unavoidable for polynomial-time algorithms, via a reduction from near-critical negative-spike sparse PCA and a dual SDP analysis. These results yield both a practical algorithm for structured SLR and a principled computational-lstatistical gap, with implications for related models such as Gaussian Graphical Models and planted sparse-vector problems.

Abstract

It is well-known that the statistical performance of Lasso can suffer significantly when the covariates of interest have strong correlations. In particular, the prediction error of Lasso becomes much worse than computationally inefficient alternatives like Best Subset Selection. Due to a large conjectured computational-statistical tradeoff in the problem of sparse linear regression, it may be impossible to close this gap in general. In this work, we propose a natural sparse linear regression setting where strong correlations between covariates arise from unobserved latent variables. In this setting, we analyze the problem caused by strong correlations and design a surprisingly simple fix. While Lasso with standard normalization of covariates fails, there exists a heterogeneous scaling of the covariates with which Lasso will suddenly obtain strong provable guarantees for estimation. Moreover, we design a simple, efficient procedure for computing such a "smart scaling." The sample complexity of the resulting "rescaled Lasso" algorithm incurs (in the worst case) quadratic dependence on the sparsity of the underlying signal. While this dependence is not information-theoretically necessary, we give evidence that it is optimal among the class of polynomial-time algorithms, via the method of low-degree polynomials. This argument reveals a new connection between sparse linear regression and a special version of sparse PCA with a near-critical negative spike. The latter problem can be thought of as a real-valued analogue of learning a sparse parity. Using it, we also establish the first computational-statistical gap for the closely related problem of learning a Gaussian Graphical Model.

Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

TL;DR

Abstract

Paper Structure (39 sections, 37 theorems, 116 equations, 2 figures)

This paper contains 39 sections, 37 theorems, 116 equations, 2 figures.

Introduction
Upper bounds
Challenge: adapting to unknown structure.
An efficient algorithm via smart scaling
Lower bounds
Independent work.
Outline
Proof Overview
Upper bounds
Algorithm description.
Why does this work?
Lower bounds
Related work
Algorithms
Preconditioned Lasso.
...and 24 more sections

Key Result

Theorem 1.4

Let $n,m,k,h \in \mathbb{N}$ and $\alpha,\delta,\sigma,\lambda>0$. Suppose that $\Sigma \in \mathbb{R}^{n\times n}$ is $(\alpha,h)$-rescalable at sparsity $k$, and $w^\star \in \mathbb{R}^n$ is $k$-sparse. Let $(X^{(j)}, y^{(j)})_{j=1}^m \sim {\mathsf{SLR}}_{\Sigma,\sigma}(w^\star)$ be independent s

Figures (2)

Figure 1: (a) Example graphical model with $X_1,\dots,X_4$ observed and $H_1,H_2$ latent. (b) Example eigenspectrum that is well-conditioned aside from a few "outliers" (displayed in red).
Figure 2: Standardized Lasso vs RescaledLasso in a simple example with varying number of samples. For each datapoint, the covariates were drawn i.i.d. from the negatively spiked sparse PCA model with ambient dimension $n = 300$ and with $\beta = -0.99$. For covariate vector $X$, the ground truth response $Y$ is generated as $Y = \frac{1}{\sqrt{(1 + \beta) k}} \langle 1_S, X \rangle$ where $S$ is the set of coordinates of size $k = 5$ where the spike is supported. As we expect from the theory, RescaledLasso recovers the signal from fewer samples than Lasso applied with the usual standardization/normalization of covariates.

Theorems & Definitions (79)

Definition 1.1
Definition 1.2
Definition 1.3
Theorem 1.4
Corollary 1.5
Corollary 1.6
Theorem 1.7
Example 2.1
Lemma 2.2
proof : Proof sketch for \ref{['lemma:lp-analysis-overview']}
...and 69 more

Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

TL;DR

Abstract

Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (79)