Square Root LASSO: Well-posedness, Lipschitz stability and the tuning trade off

Aaron Berk; Simone Brugiapaglia; Tim Hoheisel

Square Root LASSO: Well-posedness, Lipschitz stability and the tuning trade off

Aaron Berk, Simone Brugiapaglia, Tim Hoheisel

TL;DR

This paper provides three point-based regularity conditions at a solution of the SR-LASSO: the weak, intermediate, and strong assumptions, and shows that the weak assumption implies uniqueness of the solution in question.

Abstract

This paper studies well-posedness and parameter sensitivity of the Square Root LASSO (SR-LASSO), an optimization model for recovering sparse solutions to linear inverse problems in finite dimension. An advantage of the SR-LASSO (e.g., over the standard LASSO) is that the optimal tuning of the regularization parameter is robust with respect to measurement noise. This paper provides three point-based regularity conditions at a solution of the SR-LASSO: the weak, intermediate, and strong assumptions. It is shown that the weak assumption implies uniqueness of the solution in question. The intermediate assumption yields a directionally differentiable and locally Lipschitz solution map (with explicit Lipschitz bounds), whereas the strong assumption gives continuous differentiability of said map around the point in question. Our analysis leads to new theoretical insights on the comparison between SR-LASSO and LASSO from the viewpoint of tuning parameter sensitivity: noise-robust optimal parameter choice for SR-LASSO comes at the "price" of elevated tuning parameter sensitivity. Numerical results support and showcase the theoretical findings.

Square Root LASSO: Well-posedness, Lipschitz stability and the tuning trade off

TL;DR

Abstract

Paper Structure (32 sections, 19 theorems, 86 equations, 7 figures)

This paper contains 32 sections, 19 theorems, 86 equations, 7 figures.

Introduction
Motivation
Main contributions
Related work
Notation
Preliminaries
Tools from variational analysis
Convex analysis tools
Uniqueness of solutions and regularity conditions for SR-LASSO
Uniqueness of solutions
Stronger regularity conditions
Intermediate condition
The strong condition
Overview of regularity conditions
Lipschitz stability under the intermediate condition
...and 17 more sections

Key Result

Lemma 2.1

\newlabellem:SM0 Let $M\in \mathbb{R}^{m\times s}$ such that $\mathrm{rank}\, M=s$, and let $v\in \mathbb{R}^m$ with $\|v\|=1$. For the matrix $W:=M^{\top}(\mathop{\mathrm{\mathbb{I}}}\nolimits-vv^{\top})M$ the following hold:

Figures (7)

Figure 1: Ordering of regularity assumptions and their implications
Figure 1: (Local) Lipschitz behavior for each program; $L_{\text{SR}}$ as in \ref{['eq:lipschitz_ub_lamda']}, $L_{\text{UC}}$ as in \ref{['eq:Lipschitz_lambda_LASSO']}. $V$ as in \ref{['cor:Lipschitz_lambda']} gives $L_{\text{UC}} \approx L_{\text{SR}} |1-V|$. Note $\lambda_{\text{nmz}}$ is defined in \ref{['sec:implementation-details']}. \newlabelfig:motivation-new-continued0
Figure 1: Visualizing uniqueness sufficiency for $(m, n, s, \gamma) = (100, 200, 2, 0.1)$. Upper solid line: error $\|\bar{x}(\lambda) - x^\sharp\|$ where $\bar{x}(\lambda)$ solves \ref{['eq:SR-LASSO']}. Lower solid line: empirical version of \ref{['ass:weak']}(ii) that partially suffices for uniqueness, $\|A_{I^{C}}^{\top}\bar{y}\|_{\infty}$, where $\bar{y}$ solves \ref{['eq:srlasso-dual']} and $\bar{z}$ solves \ref{['eq:auxiliary-problem']}. Grey shaded vertical rectangles correspond with $\lambda$ for which $Z^{*} = \infty$. Diagonal dashed line $y = \lambda$ serves as reference for lower solid line. Horizontal position of vertical dashed line denotes $\bar{\lambda}^{\text{SR}}_{\text{best}}$. \newlabelfig:uniqueness-sufficiency-simple0
Figure 2: Comparison between SR-LASSO (SR) and (unconstrained) LASSO (UC): recovery of an unknown sparse signal $x^{\sharp} \in \mathbb{R}^n$ from noisy underdetermined linear measurements (cf.\ref{['eq:CS_model']}). For $\lambda > 0$, denote by $\bar{x}_{\text{SR}}(\lambda)$, $\bar{x}_{\text{UC}}(\lambda) \in \mathbb{R}^{n}$ respective solutions to SR-LASSO and LASSO. The optimal parameter values $\lambda_{\text{best}}^{\text{SR}}, \lambda_{\text{best}}^{\text{UC}} > 0$ for each program minimize the $\ell_2$ approximation error between the solution and $x^\sharp$. \ref{['fig:motivation-new']} corresponds with $(\gamma, m) = (0.5, 100)$ in \ref{['fig:empirical-lipschitz-bound-mg']}. See \ref{['sec:sr-vs-uc']} and adcock2019correctingbelloni2011squarevan2016estimation for further details and discussion; \ref{['sec:implementation-details']} for numerical implementation details.
Figure 2: Effect of noise scale on error sensitivity for \ref{['eq:SR-LASSO']} (sr) and \ref{['eq:unconstrained-LASSO']} (uc) faceted by $(\gamma, m) \in \{0.1, 0.5, 1, 5, 10\} \times \{50, 100, 150, 200\}$ with $(s, n) = (7, 200)$.
...and 2 more figures

Theorems & Definitions (42)

Lemma 2.1: Sherman-Morrison-Woodbury
Proof 1
Corollary 2.2
Proof 2
Lemma 2.3: rockafellar1998variational
Lemma 2.4: Subdifferential of $\ell_1$-norm
Proposition 3.1: Fenchel-Rockafellar duality scheme for \ref{['eq:SR-LASSO']}
Proof 3
Corollary 3.2
Proof 4
...and 32 more

Square Root LASSO: Well-posedness, Lipschitz stability and the tuning trade off

TL;DR

Abstract

Square Root LASSO: Well-posedness, Lipschitz stability and the tuning trade off

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (42)