Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

Jingbo Liu

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

Jingbo Liu

TL;DR

The paper tackles the stability of debiased Lasso inference when a single design column is locally updated, proposing an explicit approximate update for a debiased coefficient $\hat{\alpha}^U_j$. It establishes nonasymptotic error bounds and, under i.i.d. sub-Gaussian designs with bounded condition numbers, shows that the update is asymptotically correct for most coordinates in the proportional sparsity regime. This enables fast resampling-based variable selection methods, including a local knockoff filter and a fast CRT, reducing computational complexity from $O(p^4K)$ to $O(p^3)$ or $O(p^3+p^2K)$ while preserving asymptotic FDR control. The results are supported by theoretical proofs and empirical experiments on synthetic and real data, highlighting practical gains in high-dimensional inference with correlated designs.

Abstract

Suppose that we first apply the Lasso to a design matrix, and then update one of its columns. In general, the signs of the Lasso coefficients may change, and there is no closed-form expression for updating the Lasso solution exactly. In this work, we propose an approximate formula for updating a debiased Lasso coefficient. We provide general nonasymptotic error bounds in terms of the norms and correlations of a given design matrix's columns, and then prove asymptotic convergence results for the case of a random design matrix with i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the approximate formula is asymptotically correct for most coordinates in the proportional growth regime, under the mild assumption that each row of the design matrix is sub-Gaussian with a covariance matrix having a bounded condition number. Our proof only requires certain concentration and anti-concentration properties to control various error terms and the number of sign changes. In contrast, rigorously establishing distributional limit properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly general assumptions has been considered open problem in the universality theory. As applications, we show that the approximate formula allows us to reduce the computation complexity of variable selection algorithms that require solving multiple Lasso problems, such as the conditional randomization test and a variant of the knockoff filter.

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

TL;DR

The paper tackles the stability of debiased Lasso inference when a single design column is locally updated, proposing an explicit approximate update for a debiased coefficient

. It establishes nonasymptotic error bounds and, under i.i.d. sub-Gaussian designs with bounded condition numbers, shows that the update is asymptotically correct for most coordinates in the proportional sparsity regime. This enables fast resampling-based variable selection methods, including a local knockoff filter and a fast CRT, reducing computational complexity from

while preserving asymptotic FDR control. The results are supported by theoretical proofs and empirical experiments on synthetic and real data, highlighting practical gains in high-dimensional inference with correlated designs.

Abstract

Paper Structure (28 sections, 26 theorems, 164 equations, 2 figures, 3 tables, 3 algorithms)

This paper contains 28 sections, 26 theorems, 164 equations, 2 figures, 3 tables, 3 algorithms.

Introduction
Main Results
An approximate update formula
Asymptotic error control
Proof of the Approximate Formula
Application in False Discovery Rate Control
Review of the knockoff filter and its limitation
A "local" knockoff filter
Fast conditional randomization test
Experiments
Approximation errors in the update formula
FDR control with synthetic data
FDR control with Riboflavin data
Conclusion and Future Work
Errors in the Projection Matrices
...and 13 more sections

Key Result

Theorem 1

Let $Y\in\mathbb{R}^n$ be arbitrary, and let $\hat{\alpha}$ and $\hat{\beta}$ be as defined in eq_alpha-eq_beta. Set $\mathcal{J}:=\{l\colon \chi^{\alpha}_l\neq \chi^{\beta}_l\}$, and suppose that where $\mathcal{A}:=\{l\neq j\colon \chi^{\alpha}_l\neq 0\}$, and the max is over $\Delta\subseteq\{1,\dots,p\}\setminus\{j\}$ of size at most $n\varepsilon$, and $\Gamma,D>0$. Then Moreover, if $t(j,B

Figures (2)

Figure 1: Comparison of $\hat{\beta}^{(j)U}_j$ (cross) and its approximation error $\hat{\beta}^{(j)U}_j-\tilde{\gamma}_j$ (circle) for $\rho=0,0.5,0.95$.
Figure 2: Comparison of $\hat{\beta}^{(j)}_j$ (cross) and its approximation error (circle) for $\rho=0,0.5,0.95$.

Theorems & Definitions (62)

Theorem 1
Theorem 2
Definition 1
Remark 1
Theorem 3
Theorem 4
Definition 2
Corollary 5
Remark 2
proof : Proof of Therem \ref{['thm_error']}.
...and 52 more

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

TL;DR

Abstract

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (62)