A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

Youssef Diouane; Mohamed Laghdaf Habiboullah; Dominique Orban

A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

Youssef Diouane, Mohamed Laghdaf Habiboullah, Dominique Orban

TL;DR

The paper introduces R2N, a proximal modified quasi-Newton method for nonsmooth, potentially nonconvex regularized optimization, and its diagonal variant R2DH. It establishes global convergence without requiring Lipschitz continuity of the gradient and permits unbounded model Hessians under a mild growth condition, delivering tight worst-case complexity bounds that extend to $p$-dependent Hessian growth with $0\le p\le1$. Non-monotone variants and diagonal Hessian specializations enhance practical performance, with detailed complexity analysis and robust subproblem strategies. Comprehensive numerical experiments across basis-pursuit denoise, matrix completion, nonlinear SVM, image denoising, and an inverse nonlinear problem demonstrate competitive performance, especially for diagonal and subproblem-solver configurations, and validate the method’s applicability to a broad class of nonsmooth, nonconvex problems.

Abstract

We develop R2N, a modified quasi-Newton method for minimizing the sum of a $\mathcal{C}^1$ function $f$ and a lower semi-continuous prox-bounded $h$. Both $f$ and $h$ may be nonconvex. At each iteration, our method computes a step by minimizing the sum of a quadratic model of $f$, a model of $h$, and an adaptive quadratic regularization term. A step may be computed by a variant of the proximal-gradient method. An advantage of R2N over trust-region (TR) methods is that proximal operators do not involve an extra TR indicator. We also develop the variant R2DH, in which the model Hessian is diagonal, which allows us to compute a step without relying on a subproblem solver when $h$ is separable. R2DH can be used as standalone solver, but also as subproblem solver inside R2N. We describe non-monotone variants of both R2N and R2DH. Global convergence of a first-order stationarity measure to zero holds without relying on local Lipschitz continuity of $\nabla f$, while allowing model Hessians to grow unbounded, an assumption particularly relevant to quasi-Newton models. Under Lipschitz-continuity of $\nabla f$, we establish a tight worst-case complexity bound of $O(1 / ε^{2/(1 - p)})$ to bring said measure below $ε> 0$, where $0 \leq p < 1$ controls the growth of model Hessians. The latter must not diverge faster than $|\mathcal{S}_k|^p$, where $\mathcal{S}_k$ is the set of successful iterations up to iteration $k$. When $p = 1$, we establish the tight exponential complexity bound $O(\exp(c ε^{-2}))$ where $c > 0$ is a constant. We describe our Julia implementation and report numerical experience on a classic basis-pursuit problem, an image denoising problem, a minimum-rank matrix completion problem, a nonlinear support vector machine and an inverse nonlinear problem.

A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

TL;DR

-dependent Hessian growth with

. Non-monotone variants and diagonal Hessian specializations enhance practical performance, with detailed complexity analysis and robust subproblem strategies. Comprehensive numerical experiments across basis-pursuit denoise, matrix completion, nonlinear SVM, image denoising, and an inverse nonlinear problem demonstrate competitive performance, especially for diagonal and subproblem-solver configurations, and validate the method’s applicability to a broad class of nonsmooth, nonconvex problems.

Abstract

We develop R2N, a modified quasi-Newton method for minimizing the sum of a

function

and a lower semi-continuous prox-bounded

. Both

and

may be nonconvex. At each iteration, our method computes a step by minimizing the sum of a quadratic model of

, a model of

, and an adaptive quadratic regularization term. A step may be computed by a variant of the proximal-gradient method. An advantage of R2N over trust-region (TR) methods is that proximal operators do not involve an extra TR indicator. We also develop the variant R2DH, in which the model Hessian is diagonal, which allows us to compute a step without relying on a subproblem solver when

is separable. R2DH can be used as standalone solver, but also as subproblem solver inside R2N. We describe non-monotone variants of both R2N and R2DH. Global convergence of a first-order stationarity measure to zero holds without relying on local Lipschitz continuity of

, while allowing model Hessians to grow unbounded, an assumption particularly relevant to quasi-Newton models. Under Lipschitz-continuity of

, we establish a tight worst-case complexity bound of

to bring said measure below

, where

controls the growth of model Hessians. The latter must not diverge faster than

, where

is the set of successful iterations up to iteration

. When

, we establish the tight exponential complexity bound

where

is a constant. We describe our Julia implementation and report numerical experience on a classic basis-pursuit problem, an image denoising problem, a minimum-rank matrix completion problem, a nonlinear support vector machine and an inverse nonlinear problem.

Paper Structure (15 sections, 21 theorems, 95 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 15 sections, 21 theorems, 95 equations, 4 figures, 7 tables, 1 algorithm.

Introduction
Background
Models
A modified quasi-Newton method for nonsmooth optimization
Convergence analysis of Algorithm 4.1
Complexity analysis of Algorithm 4.1
Algorithmic refinements
Special case: diagonal model Hessians
Non-monotone variants
Numerical experiments
Basis pursuit denoise (BPDN)
Matrix completion
General regularized problems
FitzHugh-Nagumo inverse problem
Discussion

Key Result

Lemma 1

\newlabellem:domains0 Let asm:psi be satisfied and $B(x) = B(x)^T$ for all $x \in \mathds{R}^n$. Then, $\mathop{\mathrm{dom}} p = \mathds{R}^n \times \mathds{R}$. In addition, if asm:unif-prox-bounded holds, $\mathop{\mathrm{dom}} P \supseteq \{ (x, \sigma) \mid \sigma > \max(\lambda^{-1} -\lambda

Figures (4)

Figure 1: BPDN objective vs. iterations (left) and CPU time (right).
Figure 2: Objectives vs. iterations for MC with rank (left) and nuclear norm (right) regularizers.
Figure 3: Plots of the objective vs. iterations related to SVM (left) and denoise (right).
Figure 4: FH objective vs. iterations.

Theorems & Definitions (41)

Definition 1
Definition 2
Definition 3
Lemma 1
Proof 1
Lemma 2
Proof 2
Proposition 3
Proof 3
Lemma 4
...and 31 more

A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

TL;DR

Abstract

A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (41)