Table of Contents
Fetching ...

Exact recovery in the double sparse model: sufficient and necessary signal conditions

Shixiang Liu, Zhifan Li, Yanhang Zhang, Jianxin Yin

TL;DR

This work analyzes a double sparse linear model with simultaneous group and element sparsity, establishing precise minimum-signal conditions that are both sufficient and necessary for exact support recovery. It introduces a two-stage Double Sparse Iterative Hard Thresholding (DSIHT) algorithm, proving exact recovery and oracle-normality under those conditions and showing minimax-rate optimality. Theoretical results are complemented by extensive numerical experiments, including simulations and real-data analysis, demonstrating practical performance advantages over convex methods. Overall, the paper fills a key gap in minimax theory for double sparse models and provides a computationally efficient path to nearly oracle-like inference.

Abstract

The double sparse linear model, which has both group-wise and element-wise sparsity in regression coefficients, has attracted lots of attention recently. This paper establishes the sufficient and necessary relationship between the exact support recovery and the optimal minimum signal conditions in the double sparse model. Specifically, sharply under the proposed signal conditions, a two-stage double sparse iterative hard thresholding procedure achieves exact support recovery with a suitably chosen threshold parameter. Also, this procedure maintains asymptotic normality aligning with an OLS estimator given true support, hence holding the oracle properties. Conversely, we prove that no method can achieve exact support recovery if these signal conditions are violated. This fills a critical gap in the minimax optimality theory on support recovery of the double sparse model. Finally, numerical experiments are provided to support our theoretical findings.

Exact recovery in the double sparse model: sufficient and necessary signal conditions

TL;DR

This work analyzes a double sparse linear model with simultaneous group and element sparsity, establishing precise minimum-signal conditions that are both sufficient and necessary for exact support recovery. It introduces a two-stage Double Sparse Iterative Hard Thresholding (DSIHT) algorithm, proving exact recovery and oracle-normality under those conditions and showing minimax-rate optimality. Theoretical results are complemented by extensive numerical experiments, including simulations and real-data analysis, demonstrating practical performance advantages over convex methods. Overall, the paper fills a key gap in minimax theory for double sparse models and provides a computationally efficient path to nearly oracle-like inference.

Abstract

The double sparse linear model, which has both group-wise and element-wise sparsity in regression coefficients, has attracted lots of attention recently. This paper establishes the sufficient and necessary relationship between the exact support recovery and the optimal minimum signal conditions in the double sparse model. Specifically, sharply under the proposed signal conditions, a two-stage double sparse iterative hard thresholding procedure achieves exact support recovery with a suitably chosen threshold parameter. Also, this procedure maintains asymptotic normality aligning with an OLS estimator given true support, hence holding the oracle properties. Conversely, we prove that no method can achieve exact support recovery if these signal conditions are violated. This fills a critical gap in the minimax optimality theory on support recovery of the double sparse model. Finally, numerical experiments are provided to support our theoretical findings.
Paper Structure (51 sections, 17 theorems, 150 equations, 7 figures, 3 tables, 2 algorithms)

This paper contains 51 sections, 17 theorems, 150 equations, 7 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Assume that the design matrix $X$ satisfies DSRIP$\left((1+2A)s,\frac{1+4A}{1+2A}s_0, \delta \right)$ (see Definition df2) with $\delta \in (0,1)$. Assume that $ss_0 \Delta(s,s_0)= O(n)$ and $\kappa \in (\delta,1)$. Then, by taking $\lambda_{(\infty)} = C_\lambda \sigma \cdot \sqrt{\Delta(s,s_0)/n}$

Figures (7)

  • Figure 1: To fully comprehend the subspaces ${\Theta}_{e,1}, {\Theta}_{e,2}$, we take two examples $\beta_1,\beta_2$ from them respectively and use the black solid regions to represent their support sets (take $m=8$, $d=6$ and $s=s_0=3$). We reshape the group structure as $6\times 8$ matrices with each column representing a group. In ${\Theta}_{e,1}$ (subfigure (a)), the support groups are the first three groups. In ${\Theta}_{e,2}$ (subfigure (b)), for each support group, only the first three entries are support entries.
  • Figure 2: Performance metrics with increasing signal strength. The x-axis represents the minimum signal strength. Each point is averaged from 300 Monte Carlo simulations. The Oracle method (fitted on the true support $S^*$) has its MCCs always at 1 and Hamming loss always at 0, hence we ignore them. The Debiased-SGLasso produces a desparsified estimator with relatively large $\ell_2$ errors, consequently, its extreme values are omitted from the L2 Error Rate plot. Additionally, its element-wise Hamming loss is always at $2000-25=1975$, so it is omitted from the Hamming Element plot.
  • Figure 3: The histograms of $\sqrt n \left(\hat{\beta}_{(1,1)} - \beta_{(1,1)}^* \right)$ with 300 Monte Carlo simulations conducted for each sample size. Blue vertical lines represent the sample means, and red vertical lines represent the population means.
  • Figure 4: The histograms of $\sqrt n \sum_{i=1}^5\left(\hat{\beta}_{(i,3)} - \beta_{(i,3)}^* \right)$ with 300 Monte Carlo simulations conducted for each sample size. Blue vertical lines represent the sample means, and red vertical lines represent the population means.
  • Figure 5: Refinement effect of the second-stage DSIHT with increasing signal strength. Each point is averaged from 300 Monte Carlo simulations. In any method that incorporates the second-stage DSIHT iteration, this specific part relies on the grid points in \ref{['eq: grids']} and employs 5-fold cross-validation for data-driven estimation.
  • ...and 2 more figures

Theorems & Definitions (37)

  • Definition 1: DSRIP condition
  • Theorem 1: Estimation upper bound
  • Proposition 2.1: Convergence to $\tilde{\beta}^*$
  • Theorem 2: Exact support recovery
  • Remark 1: Initial estimator of Algorithm \ref{['scaledIHT']}
  • Remark 2: A delicate proof technnique for support recovery
  • Remark 3: Interpretation of the fixed threshold $\mu$
  • Remark 4: Review the DSRIP constant
  • Theorem 3: Asymptotic Normality
  • Remark 5: The rate of $B_{S^*}$
  • ...and 27 more