Table of Contents
Fetching ...

Thresholded Lasso for high dimensional variable selection

Shuheng Zhou

TL;DR

This work develops and analyzes the Thresholded Lasso for high-dimensional linear regression when $n \ll p$. The method combines an initial Lasso fit with a data-driven thresholding step and an OLS refit on the selected indices, achieving sparse oracle-type $\ell_2$ loss under Restricted Eigenvalue and related sparse-eigenvalue conditions, without requiring a strong $\beta_{\min}$ assumption. It also extends the Gauss-Dantzig approach under Uniform Uncertainty Principle and provides detailed proof sketches and extensive numerical validation showing near-optimal support recovery and favorable error rates. The results offer a practical, robust framework for simultaneous variable selection and estimation in ultrahigh dimensions, with explicit error bounds that adapt to the underlying sparsity pattern. Overall, the Thresholded Lasso provides a theoretically justified, implementable approach that closely matches oracle performance while keeping the selected model compact, even when many weak signals are present.

Abstract

Given $n$ noisy samples with $p$ dimensions, where $n \ll p$, we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\it Thresholded Lasso}, can accurately estimate a sparse vector $β\in {\mathbb R}^p$ in a linear model $Y = X β+ ε$, where $X_{n \times p}$ is a design matrix normalized to have column $\ell_2$-norm $\sqrt{n}$, and $ε\sim N(0, σ^2 I_n)$. We show that under the restricted eigenvalue (RE) condition, it is possible to achieve the $\ell_2$ loss within a logarithmic factor of the ideal mean square error one would achieve with an $oracle$ while selecting a sufficiently sparse model -- hence achieving $sparse \ oracle \ inequalities$; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. We also show for the Gauss-Dantzig selector (Candès-Tao 07), if $X$ obeys a uniform uncertainty principle, one will achieve the sparse oracle inequalities as above, while allowing at most $s_0$ irrelevant variables in the model in the worst case, where $s_0 \leq s$ is the smallest integer such that for $λ= \sqrt{2 \log p/n}$, $\sum_{i=1}^p \min(β_i^2, λ^2 σ^2) \leq s_0 λ^2 σ^2$. Our simulation results on the Thresholded Lasso match our theoretical analysis excellently.

Thresholded Lasso for high dimensional variable selection

TL;DR

This work develops and analyzes the Thresholded Lasso for high-dimensional linear regression when . The method combines an initial Lasso fit with a data-driven thresholding step and an OLS refit on the selected indices, achieving sparse oracle-type loss under Restricted Eigenvalue and related sparse-eigenvalue conditions, without requiring a strong assumption. It also extends the Gauss-Dantzig approach under Uniform Uncertainty Principle and provides detailed proof sketches and extensive numerical validation showing near-optimal support recovery and favorable error rates. The results offer a practical, robust framework for simultaneous variable selection and estimation in ultrahigh dimensions, with explicit error bounds that adapt to the underlying sparsity pattern. Overall, the Thresholded Lasso provides a theoretically justified, implementable approach that closely matches oracle performance while keeping the selected model compact, even when many weak signals are present.

Abstract

Given noisy samples with dimensions, where , we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\it Thresholded Lasso}, can accurately estimate a sparse vector in a linear model , where is a design matrix normalized to have column -norm , and . We show that under the restricted eigenvalue (RE) condition, it is possible to achieve the loss within a logarithmic factor of the ideal mean square error one would achieve with an while selecting a sufficiently sparse model -- hence achieving ; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. We also show for the Gauss-Dantzig selector (Candès-Tao 07), if obeys a uniform uncertainty principle, one will achieve the sparse oracle inequalities as above, while allowing at most irrelevant variables in the model in the worst case, where is the smallest integer such that for , . Our simulation results on the Thresholded Lasso match our theoretical analysis excellently.
Paper Structure (41 sections, 20 theorems, 212 equations, 8 figures, 4 tables)

This paper contains 41 sections, 20 theorems, 212 equations, 8 figures, 4 tables.

Key Result

Theorem 2.1

(Ideal model selection for the Thresholded Lasso) Suppose $\beta \in \mathbb R^p$ is $s$-sparse. Let $s_0$ be as in eq::define-s0. Let $Y = X \beta + \epsilon$, where $\epsilon =(\epsilon_1, \ldots, \epsilon_n)^T$ is a vector containing independent and identically distributed (i.i.d.) noise with $\e

Figures (8)

  • Figure 1: In this model, the component $\beta^{(11)}$ has $a_0$ non-zero coordinates with the same magnitude $C_a \lambda \sigma =:\beta_{\min, A_0}$, where $C_a \in \{1.706, 8.528\}$ and $\beta_{\min, A_0} \in \{0.2, 1\}$; the component $\beta^{(12)}$ has $s_0 - a_0$ non-zero coordinates with the same magnitude $C_m \lambda \sigma$, where $C_m = 1/{\sqrt{2}}$ for $s> s_0$ and $C_m=1$ in case $s_0 = s$; the component $\beta^{(2)}$ has $s - s_0$ non-zero coordinates with the same magnitude $C_t \lambda \sigma =: c_t \sigma/\sqrt{n}$. See \ref{['eq::tailcount']}. The rest are all 0s. In the exact sparse case, namely, when $s=s_0$, all non-zero signals are concentrated on the component $\beta^{(1)}$ without spreading across components of $\beta^{(2)}$.
  • Figure 2: $p=2048, n=1600$. Left column: $\left\lVert h_{T_0^c}\right\rVert_1$, $\left\lVert h_{T_0}\right\rVert_1$, and $\left\lVert h\right\rVert_1$ as Lasso penalty ($f_p$) increases across different sparsity $s \in \{130, 370, 511\}$. Right column: plots of $\left\lVert h_{T_0}\right\rVert_2$ and $\left\lVert\delta\right\rVert_2$. In the top panel, we fix $\gamma = 0.3$, and compare two cases of $C_a \lambda \sigma \in \{0.2, 1 \}$. In the middle panel, we fix $C_a \lambda \sigma = 1$ and compare two cases of $\gamma \in \{0.3, 0.7\}$. In the bottom panel, we zoom in on one case with $\gamma=0.7, C_a \lambda \sigma = 0.2$, and we plot $\left\lVert\delta\right\rVert_1$ together with $\left\lVert\delta\right\rVert_2$ in the bottom right panel.
  • Figure 3: $p=2048, n=1600, \gamma=0.7$. Plots of model size ($|I|$), number of TPs and FPs, as threshold increases. Note $|I|=$ TPs + FPs. In (a) and (b), Lasso penalty factor $f_p=0.3$ is fixed, and in panel (a) $s \in \{130, 511, 710\}$, and in panel (b) $s \in \{50, 130\}$. In panels (c) and (d), we plot the same metrics across different $f_p \in \{0.1, 0.3, 0.7\}$ with fixed $s=130$. In all panels, the 3 dotted vertical lines from left to right represent $C_m \lambda \sigma / 2, C_m \lambda\sigma$ and $\lambda\sigma$. The model size remains invariant and hence the diagonal dashed lines all stay flat for $\lambda\sigma < t_0 \le 2 \lambda \sigma$ for $\beta_{\min, A_0}=1$.
  • Figure 4: $p=2048, n=1600$, $s =130$. Plots of $\left\lVert\hat{\beta}^{\mathop{\text{\rm ols}}}(I) - \beta\right\rVert_2$, for $C_a \lambda \sigma \in \{0.2, 1\}$ and $\gamma \in \{0.3, 0.7\}$. The horizontal lines correspond to the $\ell_2$-norm error of Lasso estimate $\beta_\text{\rm init}$, namely, $\left\lVert\delta\right\rVert_2$.
  • Figure 5: Illustrative example: i.i.d. Gaussian ensemble; $p=256$, $n=72$, $s=8$, and $\sigma = \sqrt{s}/3$. (a) compare with the Lasso estimator $\tilde{\beta}$ which minimizes $\ell_2$ loss. Here $\tilde{\beta}$ has only 3 FPs, but $\rho^2$ is large with a value of $64.73$. (b) Compare with the $\beta_{\text{\rm init}}$ obtained using $\lambda_n$. The dotted lines show the thresholding level $t_0$. The $\beta_{\text{\rm init}}$ has 15 FPs, all of which were cut after the 2nd step; resulting $\rho^2= 12.73$. After refitting with OLS in the 3rd step, for the $\hat{\beta}$, $\rho^2$ is further reduced to $0.51$.
  • ...and 3 more figures

Theorems & Definitions (31)

  • Theorem 2.1
  • Remark 2.2
  • Lemma 2.3
  • Theorem 2.4
  • Lemma 2.5
  • Remark 2.6
  • Lemma 2.7
  • Lemma 2.8
  • Remark 2.9
  • Remark 2.10
  • ...and 21 more