Table of Contents
Fetching ...

Understanding Best Subset Selection: A Tale of Two C(omplex)ities

Saptarshi Roy, Ambuj Tewari, Ziwei Zhu

TL;DR

This work analyzes best subset selection in high-dimensional sparse regression by introducing an identifiability margin, $ au_*(s)$, and two geometry-driven complexity measures that capture the spaces of residualized signals and spurious projections. The key result provides a sharp sufficient condition: if $ au_*(s)$ scaled by noise dominates the maximum complexities (up to log factors), BSS recovers the true active set $ extcal{S}$ with high probability; a complementary necessary condition shows that larger complexities bound the margin needed for consistency. The framework clarifies how correlation structure shapes model discrimination and explains why some correlated designs can be more favorable to BSS than orthogonal designs. The authors also extend the analysis to GLMs, offering a principled way to assess model selection under broader link functions, and provide simulations illustrating the theory. Overall, the paper reveals that geometric complexities of residualized signals and spurious projections fundamentally govern the margin conditions for exact model recovery in BSS, guiding design considerations and future method development.

Abstract

We consider the problem of best subset selection (BSS) under high-dimensional sparse linear regression model. Recently, Guo et al. (2020) showed that the model selection performance of BSS depends on a certain identifiability margin, a measure that captures the model discriminative power of BSS under a general correlation structure that is robust to the design dependence, unlike its computational surrogates such as LASSO, SCAD, MCP, etc. Expanding on this, we further broaden the theoretical understanding of best subset selection in this paper and show that the complexities of the residualized signals, the portion of the signals orthogonal to the true active features, and spurious projections, describing the projection operators associated with the irrelevant features, also play fundamental roles in characterizing the margin condition for model consistency of BSS. In particular, we establish both necessary and sufficient margin conditions depending only on the identifiability margin and the two complexity measures. We also partially extend our sufficiency result to the case of high-dimensional sparse generalized linear models (GLMs).

Understanding Best Subset Selection: A Tale of Two C(omplex)ities

TL;DR

This work analyzes best subset selection in high-dimensional sparse regression by introducing an identifiability margin, , and two geometry-driven complexity measures that capture the spaces of residualized signals and spurious projections. The key result provides a sharp sufficient condition: if scaled by noise dominates the maximum complexities (up to log factors), BSS recovers the true active set with high probability; a complementary necessary condition shows that larger complexities bound the margin needed for consistency. The framework clarifies how correlation structure shapes model discrimination and explains why some correlated designs can be more favorable to BSS than orthogonal designs. The authors also extend the analysis to GLMs, offering a principled way to assess model selection under broader link functions, and provide simulations illustrating the theory. Overall, the paper reveals that geometric complexities of residualized signals and spurious projections fundamentally govern the margin conditions for exact model recovery in BSS, guiding design considerations and future method development.

Abstract

We consider the problem of best subset selection (BSS) under high-dimensional sparse linear regression model. Recently, Guo et al. (2020) showed that the model selection performance of BSS depends on a certain identifiability margin, a measure that captures the model discriminative power of BSS under a general correlation structure that is robust to the design dependence, unlike its computational surrogates such as LASSO, SCAD, MCP, etc. Expanding on this, we further broaden the theoretical understanding of best subset selection in this paper and show that the complexities of the residualized signals, the portion of the signals orthogonal to the true active features, and spurious projections, describing the projection operators associated with the irrelevant features, also play fundamental roles in characterizing the margin condition for model consistency of BSS. In particular, we establish both necessary and sufficient margin conditions depending only on the identifiability margin and the two complexity measures. We also partially extend our sufficiency result to the case of high-dimensional sparse generalized linear models (GLMs).
Paper Structure (41 sections, 12 theorems, 217 equations, 7 figures, 1 table)

This paper contains 41 sections, 12 theorems, 217 equations, 7 figures, 1 table.

Key Result

Lemma 1

For any given $\widehat{s}>0$, if there exists a $\mathcal{D} \in \mathscr{A}_{\widehat{s}}$ such that ${\boldsymbol\beta}_{\mathcal{S} \setminus \mathcal{D}}^\top \Gamma(\mathcal{D}) {\boldsymbol\beta}_{\mathcal{S} \setminus \mathcal{D}} = 0$, then there exists $\mathbf{b}\in \mathbb{R}^{\widehat{s

Figures (7)

  • Figure 1: Distance between $\widehat{{\boldsymbol\gamma}}_{\mathcal{D}_1}$ and $\widehat{{\boldsymbol\gamma}}_{\mathcal{D}_2}$ correctly captures the angular separation.
  • Figure 2: The figure shows the two principal angles between two subspaces $U$ and $V$. $\{u, u^\perp\}$ are the two orthonormal basis of $U$, and $v$ is an orthonormal basis of $V$. $\theta_2$ is the maximum principal angle between $U$ and $V$.
  • Figure 3: (a) shows the angle between spurious features in $\mathcal{G}^{(1)}_\varnothing$ for $r = 0$, (b) shows the angle between spurious features in $\mathcal{G}^{(1)}_\varnothing$ for $r = 0.9$.
  • Figure 4: Maximum principal angle between the subspaces $U:=\text{span}\{\mathbf{X}_1, \mathbf{X}_2\}$ and $V:= \text{span}\{\mathbf{X}_3, \mathbf{X}_4\}$ for $r = 0.99$ under equi-correlated Gaussian design.
  • Figure 5: Model recovery rate of ABESS under independent block design.
  • ...and 2 more figures

Theorems & Definitions (21)

  • Example 1
  • Lemma 1
  • Proposition 1
  • Theorem 1: Sufficiency
  • Corollary 1
  • Lemma 2
  • Remark 1
  • Theorem 2: Necessity
  • proof
  • proof
  • ...and 11 more