Table of Contents
Fetching ...

Unveiling low-dimensional patterns induced by convex non-differentiable regularizers

Ivan Hejný, Jonas Wallin, Małgorzata Bogdan, Michał Kos

TL;DR

This work develops a rigorous asymptotic theory for pattern recovery induced by convex non-differentiable regularizers in linear regression with fixed $p$ and penalty scaled as $\sqrt{n}$. It shows that while the rescaled estimation error $\hat{u}_n = \sqrt{n}(\hat{β}_n - β^0)$ converges in distribution to a limiting $û$ solving $V(u)=\tfrac{1}{2}u^TCu - u^TW + f'(β^0;u)$, the pattern (subdifferential-driven clustering) requires convergence of subdifferentials in Hausdorff distance to guarantee convergence of the pattern; the authors establish weak pattern convergence and derive the limiting probability of recovering the true pattern, which depends on an asymptotic irrepresentability condition. They propose a robust two-step proximal method to recover patterns irrespective of irrepresentability, and show how concavifying the Fused Lasso can fix its pattern-recovery failure under independence. Theoretical results are complemented by simulations comparing Lasso, Fused Lasso, and SLOPE, and by concrete discussions of Generalized Lasso and SLOPE examples. Overall, the paper provides a comprehensive framework linking proximal-geometry of penalties to asymptotic pattern behavior with practical implications for model selection and dimensionality reduction.

Abstract

Popular regularizers with non-differentiable penalties, such as Lasso, Elastic Net, Generalized Lasso, or SLOPE, reduce the dimension of the parameter space by inducing sparsity or clustering in the estimators' coordinates. In this paper, we focus on linear regression and explore the asymptotic distributions of the resulting low-dimensional patterns when the number of regressors $p$ is fixed, the number of observations $n$ goes to infinity, and the penalty function increases at the rate of $\sqrt{n}$. While the asymptotic distribution of the rescaled estimation error can be derived by relatively standard arguments, convergence of patterns requires a separate proof, which is yet missing from the literature, even for the simplest case of Lasso. To fill this gap, we use the Hausdorff distance as a suitable mode of convergence for subdifferentials, resulting in the desired pattern convergence. Furthermore, we derive the exact limiting probability of recovering the true model pattern. This probability goes to 1 if and only if the penalty scaling constant diverges to infinity and the regularizer-specific asymptotic irrepresentability condition is satisfied. We then propose simple two-step procedures that asymptotically recover the model patterns, irrespective of whether the irrepresentability condition holds or not. Interestingly, our theory shows that Fused Lasso cannot reliably recover its own clustering pattern, even for independent regressors. It also demonstrates how this problem can be resolved by "concavifying" the Fused Lasso penalty coefficients. Additionally, sampling from the asymptotic error distribution facilitates comparisons between different regularizers. We provide short simulation studies showcasing an illustrative comparison between the asymptotic properties of Lasso, Fused Lasso, and SLOPE.

Unveiling low-dimensional patterns induced by convex non-differentiable regularizers

TL;DR

This work develops a rigorous asymptotic theory for pattern recovery induced by convex non-differentiable regularizers in linear regression with fixed and penalty scaled as . It shows that while the rescaled estimation error converges in distribution to a limiting solving , the pattern (subdifferential-driven clustering) requires convergence of subdifferentials in Hausdorff distance to guarantee convergence of the pattern; the authors establish weak pattern convergence and derive the limiting probability of recovering the true pattern, which depends on an asymptotic irrepresentability condition. They propose a robust two-step proximal method to recover patterns irrespective of irrepresentability, and show how concavifying the Fused Lasso can fix its pattern-recovery failure under independence. Theoretical results are complemented by simulations comparing Lasso, Fused Lasso, and SLOPE, and by concrete discussions of Generalized Lasso and SLOPE examples. Overall, the paper provides a comprehensive framework linking proximal-geometry of penalties to asymptotic pattern behavior with practical implications for model selection and dimensionality reduction.

Abstract

Popular regularizers with non-differentiable penalties, such as Lasso, Elastic Net, Generalized Lasso, or SLOPE, reduce the dimension of the parameter space by inducing sparsity or clustering in the estimators' coordinates. In this paper, we focus on linear regression and explore the asymptotic distributions of the resulting low-dimensional patterns when the number of regressors is fixed, the number of observations goes to infinity, and the penalty function increases at the rate of . While the asymptotic distribution of the rescaled estimation error can be derived by relatively standard arguments, convergence of patterns requires a separate proof, which is yet missing from the literature, even for the simplest case of Lasso. To fill this gap, we use the Hausdorff distance as a suitable mode of convergence for subdifferentials, resulting in the desired pattern convergence. Furthermore, we derive the exact limiting probability of recovering the true model pattern. This probability goes to 1 if and only if the penalty scaling constant diverges to infinity and the regularizer-specific asymptotic irrepresentability condition is satisfied. We then propose simple two-step procedures that asymptotically recover the model patterns, irrespective of whether the irrepresentability condition holds or not. Interestingly, our theory shows that Fused Lasso cannot reliably recover its own clustering pattern, even for independent regressors. It also demonstrates how this problem can be resolved by "concavifying" the Fused Lasso penalty coefficients. Additionally, sampling from the asymptotic error distribution facilitates comparisons between different regularizers. We provide short simulation studies showcasing an illustrative comparison between the asymptotic properties of Lasso, Fused Lasso, and SLOPE.
Paper Structure (23 sections, 11 theorems, 100 equations, 7 figures)

This paper contains 23 sections, 11 theorems, 100 equations, 7 figures.

Key Result

Theorem 2.1

Let $f:\mathbb{R}^p\rightarrow \mathbb{R}$ be any convex penalty function and $f_n=n^{1/2}f$. Assume $C$ is positive definite. Then $\hat{u}_n:= \sqrt{n}(\hat{\beta}_n-\beta^0)\overset{d}{\longrightarrow}\hat{u}$, where with $W\sim\mathcal{N}(0,\sigma^2C)$, and $f'({\beta^0};u)$ the directional derivative of $f$ at $\beta^0$ in direction $u$. More generally, the result holds for any sequence of c

Figures (7)

  • Figure 1: Scaling regimes, when $\hat{\beta}_n$ minimizes $\Vert y-X\beta\Vert/2+n^{\gamma}f(\beta)$, for fixed $p$ and $n\rightarrow\infty$. For $\gamma<1/2$, $\hat{\beta}_n$ is asymptotically equivalent to the OLS, there is no model selection. For $\gamma=1/2$, $\sqrt{n}(\hat{\beta}_n-\beta^0)$ converges in distribution, and $lim_{n\rightarrow\infty}\mathbb{P}[I(\hat{\beta}_n)=I(\beta^0)]\in(0,1)$. When $1/2<\gamma<1$, $\sqrt{n}(\hat{\beta}_n-\beta^0)$ diverges, and $\mathbb{P}[I(\hat{\beta}_n)=I(\beta^0)]$ converges to $1$, if the irrepresentability condition holds.
  • Figure 2: Asymptotic irrepresentability condition $\langle U_{\beta^0}\rangle \cap ri(\partial f_A(\beta^0))\neq \emptyset\;\forall \beta^0 \iff a>a_1$.
  • Figure 3: Asymptotic pattern recovery for SLOPE
  • Figure 4: Comparing root mean squared error (RMSE) for different methods together with the probability of pattern recovery, (i.e. correctly identifying all zeros and all clusters).
  • Figure 5: Comparing root mean squared error (RMSE) for different methods together with the probability of pattern recovery, (i.e. correctly identifying all zeros and all clusters).
  • ...and 2 more figures

Theorems & Definitions (32)

  • Definition 1.1
  • Theorem 2.1
  • proof
  • Lemma 3.1
  • Lemma 3.2
  • proof
  • Theorem 3.3
  • proof
  • Corollary 3.4
  • proof
  • ...and 22 more