Table of Contents
Fetching ...

A Unified Framework for Pattern Recovery in Penalized and Thresholded Estimation and its Geometry

Piotr Graczyk, Ulrike Schneider, Tomasz Skalski, Patrick Tardivel

TL;DR

This work develops a geometry-driven framework for pattern recovery in penalized estimation where penalties are polyhedral gauges, unifying several popular methods such as LASSO, generalized LASSO, and SLOPE. Patterns are defined as equivalence classes of coefficients sharing the same subdifferential, which correspond to relative interiors of normal cones to faces of the dual polytope $B^*$; the authors introduce accessibility and noiseless recovery conditions and show how the latter generalizes the irrepresentability condition across gauges. They further show that thresholded penalized estimators can achieve sure pattern recovery under weaker accessibility assumptions when the noise is small, and they provide a necessary and sufficient condition for uniform uniqueness of solutions. Numerical illustrations corroborate the theory, and the framework offers a foundation for pattern-based model selection and path computation across a broad class of penalties.

Abstract

We consider the framework of penalized estimation where the penalty term is given by a real-valued polyhedral gauge, which encompasses methods such as LASSO, generalized LASSO, SLOPE, OSCAR, PACS and others. Each of these estimators is defined through an optimization problem and can uncover a different structure or ``pattern'' of the unknown parameter vector. We define a novel and general notion of patterns based on subdifferentials and formalize an approach to measure pattern complexity. For pattern recovery, we provide a minimal condition for a particular pattern to be detected by the procedure with positive probability, the so-called accessibility condition. Using our approach, we also introduce the stronger noiseless recovery condition. For the LASSO, it is well known that the irrepresentability condition is necessary for pattern recovery with probability larger than $1/2$ and we show that the noiseless recovery plays exactly the same role in our general framework, thereby unifying and extending the irrepresentability condition to a broad class of penalized estimators. We also show that the noiseless recovery condition can be relaxed when turning to so-called thresholded penalized estimators: we prove that the necessary condition of accessibility is already sufficient for sure pattern recovery by thresholded penalized estimation provided that the noise is small enough. Throughout the article, we demonstrate how our findings can be interpreted through a geometrical lens.

A Unified Framework for Pattern Recovery in Penalized and Thresholded Estimation and its Geometry

TL;DR

This work develops a geometry-driven framework for pattern recovery in penalized estimation where penalties are polyhedral gauges, unifying several popular methods such as LASSO, generalized LASSO, and SLOPE. Patterns are defined as equivalence classes of coefficients sharing the same subdifferential, which correspond to relative interiors of normal cones to faces of the dual polytope ; the authors introduce accessibility and noiseless recovery conditions and show how the latter generalizes the irrepresentability condition across gauges. They further show that thresholded penalized estimators can achieve sure pattern recovery under weaker accessibility assumptions when the noise is small, and they provide a necessary and sufficient condition for uniform uniqueness of solutions. Numerical illustrations corroborate the theory, and the framework offers a foundation for pattern-based model selection and path computation across a broad class of penalties.

Abstract

We consider the framework of penalized estimation where the penalty term is given by a real-valued polyhedral gauge, which encompasses methods such as LASSO, generalized LASSO, SLOPE, OSCAR, PACS and others. Each of these estimators is defined through an optimization problem and can uncover a different structure or ``pattern'' of the unknown parameter vector. We define a novel and general notion of patterns based on subdifferentials and formalize an approach to measure pattern complexity. For pattern recovery, we provide a minimal condition for a particular pattern to be detected by the procedure with positive probability, the so-called accessibility condition. Using our approach, we also introduce the stronger noiseless recovery condition. For the LASSO, it is well known that the irrepresentability condition is necessary for pattern recovery with probability larger than and we show that the noiseless recovery plays exactly the same role in our general framework, thereby unifying and extending the irrepresentability condition to a broad class of penalized estimators. We also show that the noiseless recovery condition can be relaxed when turning to so-called thresholded penalized estimators: we prove that the necessary condition of accessibility is already sufficient for sure pattern recovery by thresholded penalized estimation provided that the noise is small enough. Throughout the article, we demonstrate how our findings can be interpreted through a geometrical lens.
Paper Structure (17 sections, 27 theorems, 107 equations, 4 figures)

This paper contains 17 sections, 27 theorems, 107 equations, 4 figures.

Key Result

Theorem 3.2

Let ${\rm pen}$ be a real-valued polyhedral gauge on $\mathbb{R}^p$ and let $\beta \in \mathbb{R}^p$. Then $C_\beta = {\rm ri}(N_{B^*}(s))$ where $s$ is an arbitrary element of ${\rm ri}(\partial_{{\rm pen}}(\beta))$ and ${\rm lin}(C_\beta) = \vec{{\rm aff}}(\partial_{{\rm pen}}(\beta))^\perp$.

Figures (4)

  • Figure 6: Shown are the curves of the three component functions $\lambda \mapsto \hat{\beta}_{\lambda,1}$ (black dotted curve), $\lambda \mapsto \hat{\beta}_{\lambda,2}$ (red dotted curve) and $\lambda \mapsto \hat{\beta}_{\lambda,3}$ (blue dotted curve) for $\lambda > 0$, where $\{\hat{\beta}_\lambda\} = S_{X,\lambda\|.\|_\infty}(X\beta)$. Note that ${\rm patt}_\infty(\beta)$ satisfies the noiseless recovery condition. Indeed, ${\rm patt}_\infty(\hat{\beta}_\lambda) = (0,1,1)'$ for $\lambda \in (0,8/3)$.
  • Figure 7: The vector $\beta \in \{0,1\}^{784}$, reshaped as a picture of size $28 \times 28$, represents the number six.
  • Figure 8: The probability of the accessibility condition (\ref{['subfig:slope_accessibility']}) vs the noiseless recovery condition (\ref{['subfig:slope_irrepresentability']}) being satisfied for the SLOPE pattern $\beta$ as a function of $n$, the number of rows of the matrix $X$. While the shapes are qualitatively similar, note the difference in ranges on the x-axes. The probability of accessibility is almost zero when $n \leq 500$ and almost $1$ when $n \geq 600$. For the noiseless recovery condition, the probability is almost zero when $n \leq 4000$ and almost $1$ when $n \geq 10000$.
  • Figure 9: Pattern recovery by SLOPE (\ref{['subfig:six_SLOPE']}) vs thresholded SLOPE (\ref{['subfig:six_thres-SLOPE']}): Since the noiseless recovery does not hold for the particular matrix $X$, the SLOPE estimator $\hat{\beta}_{\lambda_{\rm sure}}$ is unlikely to recover the SLOPE pattern $\beta$. Indeed, (\ref{['subfig:six_SLOPE']}) shows that $\hat{\beta}_{\lambda_{\rm sure}}$, reshaped as a picture of size $28 \times 28$, does not recover the SLOPE pattern $\beta$. On the other hand, the accessibility condition does hold and therefore thresholded SLOPE can reveal the SLOPE pattern $\beta$. In fact, (\ref{['subfig:six_thres-SLOPE']}) illustrates that $\text{\rm prox}_\tau(\hat{\beta}_{\lambda_{\rm sure}})$, reshaped as a picture of size $28 \times 28$, does recover the SLOPE pattern $\beta$ (here, $\tau > 0$ was chosen as the smallest real number for which $\text{\rm prox}_\tau(\hat{\beta}_{\lambda_{\rm sure}})$ and $\beta$ have the same complexity).

Theorems & Definitions (55)

  • Definition 3.1: Pattern equivalence class
  • Theorem 3.2
  • Corollary 3.3
  • Example : Different penalizations and their patterns
  • Definition 4.1: Accessible pattern
  • Proposition 4.2: Characterization of accessible patterns
  • Proposition 4.3
  • Corollary 4.4
  • Definition 4.5: Noiseless recovery condition
  • Theorem 4.6
  • ...and 45 more