Table of Contents
Fetching ...

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

Jingbo Liu

TL;DR

The paper tackles the stability of debiased Lasso inference when a single design column is locally updated, proposing an explicit approximate update for a debiased coefficient $\hat{\alpha}^U_j$. It establishes nonasymptotic error bounds and, under i.i.d. sub-Gaussian designs with bounded condition numbers, shows that the update is asymptotically correct for most coordinates in the proportional sparsity regime. This enables fast resampling-based variable selection methods, including a local knockoff filter and a fast CRT, reducing computational complexity from $O(p^4K)$ to $O(p^3)$ or $O(p^3+p^2K)$ while preserving asymptotic FDR control. The results are supported by theoretical proofs and empirical experiments on synthetic and real data, highlighting practical gains in high-dimensional inference with correlated designs.

Abstract

Suppose that we first apply the Lasso to a design matrix, and then update one of its columns. In general, the signs of the Lasso coefficients may change, and there is no closed-form expression for updating the Lasso solution exactly. In this work, we propose an approximate formula for updating a debiased Lasso coefficient. We provide general nonasymptotic error bounds in terms of the norms and correlations of a given design matrix's columns, and then prove asymptotic convergence results for the case of a random design matrix with i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the approximate formula is asymptotically correct for most coordinates in the proportional growth regime, under the mild assumption that each row of the design matrix is sub-Gaussian with a covariance matrix having a bounded condition number. Our proof only requires certain concentration and anti-concentration properties to control various error terms and the number of sign changes. In contrast, rigorously establishing distributional limit properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly general assumptions has been considered open problem in the universality theory. As applications, we show that the approximate formula allows us to reduce the computation complexity of variable selection algorithms that require solving multiple Lasso problems, such as the conditional randomization test and a variant of the knockoff filter.

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

TL;DR

The paper tackles the stability of debiased Lasso inference when a single design column is locally updated, proposing an explicit approximate update for a debiased coefficient . It establishes nonasymptotic error bounds and, under i.i.d. sub-Gaussian designs with bounded condition numbers, shows that the update is asymptotically correct for most coordinates in the proportional sparsity regime. This enables fast resampling-based variable selection methods, including a local knockoff filter and a fast CRT, reducing computational complexity from to or while preserving asymptotic FDR control. The results are supported by theoretical proofs and empirical experiments on synthetic and real data, highlighting practical gains in high-dimensional inference with correlated designs.

Abstract

Suppose that we first apply the Lasso to a design matrix, and then update one of its columns. In general, the signs of the Lasso coefficients may change, and there is no closed-form expression for updating the Lasso solution exactly. In this work, we propose an approximate formula for updating a debiased Lasso coefficient. We provide general nonasymptotic error bounds in terms of the norms and correlations of a given design matrix's columns, and then prove asymptotic convergence results for the case of a random design matrix with i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the approximate formula is asymptotically correct for most coordinates in the proportional growth regime, under the mild assumption that each row of the design matrix is sub-Gaussian with a covariance matrix having a bounded condition number. Our proof only requires certain concentration and anti-concentration properties to control various error terms and the number of sign changes. In contrast, rigorously establishing distributional limit properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly general assumptions has been considered open problem in the universality theory. As applications, we show that the approximate formula allows us to reduce the computation complexity of variable selection algorithms that require solving multiple Lasso problems, such as the conditional randomization test and a variant of the knockoff filter.
Paper Structure (28 sections, 26 theorems, 164 equations, 2 figures, 3 tables, 3 algorithms)

This paper contains 28 sections, 26 theorems, 164 equations, 2 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

Let $Y\in\mathbb{R}^n$ be arbitrary, and let $\hat{\alpha}$ and $\hat{\beta}$ be as defined in eq_alpha-eq_beta. Set $\mathcal{J}:=\{l\colon \chi^{\alpha}_l\neq \chi^{\beta}_l\}$, and suppose that where $\mathcal{A}:=\{l\neq j\colon \chi^{\alpha}_l\neq 0\}$, and the max is over $\Delta\subseteq\{1,\dots,p\}\setminus\{j\}$ of size at most $n\varepsilon$, and $\Gamma,D>0$. Then Moreover, if $t(j,B

Figures (2)

  • Figure 1: Comparison of $\hat{\beta}^{(j)U}_j$ (cross) and its approximation error $\hat{\beta}^{(j)U}_j-\tilde{\gamma}_j$ (circle) for $\rho=0,0.5,0.95$.
  • Figure 2: Comparison of $\hat{\beta}^{(j)}_j$ (cross) and its approximation error (circle) for $\rho=0,0.5,0.95$.

Theorems & Definitions (62)

  • Theorem 1
  • Theorem 2
  • Definition 1
  • Remark 1
  • Theorem 3
  • Theorem 4
  • Definition 2
  • Corollary 5
  • Remark 2
  • proof : Proof of Therem \ref{['thm_error']}.
  • ...and 52 more