Table of Contents
Fetching ...

Square Root LASSO: Well-posedness, Lipschitz stability and the tuning trade off

Aaron Berk, Simone Brugiapaglia, Tim Hoheisel

TL;DR

This paper provides three point-based regularity conditions at a solution of the SR-LASSO: the weak, intermediate, and strong assumptions, and shows that the weak assumption implies uniqueness of the solution in question.

Abstract

This paper studies well-posedness and parameter sensitivity of the Square Root LASSO (SR-LASSO), an optimization model for recovering sparse solutions to linear inverse problems in finite dimension. An advantage of the SR-LASSO (e.g., over the standard LASSO) is that the optimal tuning of the regularization parameter is robust with respect to measurement noise. This paper provides three point-based regularity conditions at a solution of the SR-LASSO: the weak, intermediate, and strong assumptions. It is shown that the weak assumption implies uniqueness of the solution in question. The intermediate assumption yields a directionally differentiable and locally Lipschitz solution map (with explicit Lipschitz bounds), whereas the strong assumption gives continuous differentiability of said map around the point in question. Our analysis leads to new theoretical insights on the comparison between SR-LASSO and LASSO from the viewpoint of tuning parameter sensitivity: noise-robust optimal parameter choice for SR-LASSO comes at the "price" of elevated tuning parameter sensitivity. Numerical results support and showcase the theoretical findings.

Square Root LASSO: Well-posedness, Lipschitz stability and the tuning trade off

TL;DR

This paper provides three point-based regularity conditions at a solution of the SR-LASSO: the weak, intermediate, and strong assumptions, and shows that the weak assumption implies uniqueness of the solution in question.

Abstract

This paper studies well-posedness and parameter sensitivity of the Square Root LASSO (SR-LASSO), an optimization model for recovering sparse solutions to linear inverse problems in finite dimension. An advantage of the SR-LASSO (e.g., over the standard LASSO) is that the optimal tuning of the regularization parameter is robust with respect to measurement noise. This paper provides three point-based regularity conditions at a solution of the SR-LASSO: the weak, intermediate, and strong assumptions. It is shown that the weak assumption implies uniqueness of the solution in question. The intermediate assumption yields a directionally differentiable and locally Lipschitz solution map (with explicit Lipschitz bounds), whereas the strong assumption gives continuous differentiability of said map around the point in question. Our analysis leads to new theoretical insights on the comparison between SR-LASSO and LASSO from the viewpoint of tuning parameter sensitivity: noise-robust optimal parameter choice for SR-LASSO comes at the "price" of elevated tuning parameter sensitivity. Numerical results support and showcase the theoretical findings.
Paper Structure (32 sections, 19 theorems, 86 equations, 7 figures)

This paper contains 32 sections, 19 theorems, 86 equations, 7 figures.

Key Result

Lemma 2.1

\newlabellem:SM0 Let $M\in \mathbb{R}^{m\times s}$ such that $\mathrm{rank}\, M=s$, and let $v\in \mathbb{R}^m$ with $\|v\|=1$. For the matrix $W:=M^{\top}(\mathop{\mathrm{\mathbb{I}}}\nolimits-vv^{\top})M$ the following hold:

Figures (7)

  • Figure 1: Ordering of regularity assumptions and their implications
  • Figure 1: (Local) Lipschitz behavior for each program; $L_{\text{SR}}$ as in \ref{['eq:lipschitz_ub_lamda']}, $L_{\text{UC}}$ as in \ref{['eq:Lipschitz_lambda_LASSO']}. $V$ as in \ref{['cor:Lipschitz_lambda']} gives $L_{\text{UC}} \approx L_{\text{SR}} |1-V|$. Note $\lambda_{\text{nmz}}$ is defined in \ref{['sec:implementation-details']}. \newlabelfig:motivation-new-continued0
  • Figure 1: Visualizing uniqueness sufficiency for $(m, n, s, \gamma) = (100, 200, 2, 0.1)$. Upper solid line: error $\|\bar{x}(\lambda) - x^\sharp\|$ where $\bar{x}(\lambda)$ solves \ref{['eq:SR-LASSO']}. Lower solid line: empirical version of \ref{['ass:weak']}(ii) that partially suffices for uniqueness, $\|A_{I^{C}}^{\top}\bar{y}\|_{\infty}$, where $\bar{y}$ solves \ref{['eq:srlasso-dual']} and $\bar{z}$ solves \ref{['eq:auxiliary-problem']}. Grey shaded vertical rectangles correspond with $\lambda$ for which $Z^{*} = \infty$. Diagonal dashed line $y = \lambda$ serves as reference for lower solid line. Horizontal position of vertical dashed line denotes $\bar{\lambda}^{\text{SR}}_{\text{best}}$. \newlabelfig:uniqueness-sufficiency-simple0
  • Figure 2: Comparison between SR-LASSO (SR) and (unconstrained) LASSO (UC): recovery of an unknown sparse signal $x^{\sharp} \in \mathbb{R}^n$ from noisy underdetermined linear measurements (cf.\ref{['eq:CS_model']}). For $\lambda > 0$, denote by $\bar{x}_{\text{SR}}(\lambda)$, $\bar{x}_{\text{UC}}(\lambda) \in \mathbb{R}^{n}$ respective solutions to SR-LASSO and LASSO. The optimal parameter values $\lambda_{\text{best}}^{\text{SR}}, \lambda_{\text{best}}^{\text{UC}} > 0$ for each program minimize the $\ell_2$ approximation error between the solution and $x^\sharp$. \ref{['fig:motivation-new']} corresponds with $(\gamma, m) = (0.5, 100)$ in \ref{['fig:empirical-lipschitz-bound-mg']}. See \ref{['sec:sr-vs-uc']} and adcock2019correctingbelloni2011squarevan2016estimation for further details and discussion; \ref{['sec:implementation-details']} for numerical implementation details.
  • Figure 2: Effect of noise scale on error sensitivity for \ref{['eq:SR-LASSO']} (sr) and \ref{['eq:unconstrained-LASSO']} (uc) faceted by $(\gamma, m) \in \{0.1, 0.5, 1, 5, 10\} \times \{50, 100, 150, 200\}$ with $(s, n) = (7, 200)$.
  • ...and 2 more figures

Theorems & Definitions (42)

  • Lemma 2.1: Sherman-Morrison-Woodbury
  • Proof 1
  • Corollary 2.2
  • Proof 2
  • Lemma 2.3: rockafellar1998variational
  • Lemma 2.4: Subdifferential of $\ell_1$-norm
  • Proposition 3.1: Fenchel-Rockafellar duality scheme for \ref{['eq:SR-LASSO']}
  • Proof 3
  • Corollary 3.2
  • Proof 4
  • ...and 32 more