Table of Contents
Fetching ...

A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

Youssef Diouane, Mohamed Laghdaf Habiboullah, Dominique Orban

TL;DR

The paper introduces R2N, a proximal modified quasi-Newton method for nonsmooth, potentially nonconvex regularized optimization, and its diagonal variant R2DH. It establishes global convergence without requiring Lipschitz continuity of the gradient and permits unbounded model Hessians under a mild growth condition, delivering tight worst-case complexity bounds that extend to $p$-dependent Hessian growth with $0\le p\le1$. Non-monotone variants and diagonal Hessian specializations enhance practical performance, with detailed complexity analysis and robust subproblem strategies. Comprehensive numerical experiments across basis-pursuit denoise, matrix completion, nonlinear SVM, image denoising, and an inverse nonlinear problem demonstrate competitive performance, especially for diagonal and subproblem-solver configurations, and validate the method’s applicability to a broad class of nonsmooth, nonconvex problems.

Abstract

We develop R2N, a modified quasi-Newton method for minimizing the sum of a $\mathcal{C}^1$ function $f$ and a lower semi-continuous prox-bounded $h$. Both $f$ and $h$ may be nonconvex. At each iteration, our method computes a step by minimizing the sum of a quadratic model of $f$, a model of $h$, and an adaptive quadratic regularization term. A step may be computed by a variant of the proximal-gradient method. An advantage of R2N over trust-region (TR) methods is that proximal operators do not involve an extra TR indicator. We also develop the variant R2DH, in which the model Hessian is diagonal, which allows us to compute a step without relying on a subproblem solver when $h$ is separable. R2DH can be used as standalone solver, but also as subproblem solver inside R2N. We describe non-monotone variants of both R2N and R2DH. Global convergence of a first-order stationarity measure to zero holds without relying on local Lipschitz continuity of $\nabla f$, while allowing model Hessians to grow unbounded, an assumption particularly relevant to quasi-Newton models. Under Lipschitz-continuity of $\nabla f$, we establish a tight worst-case complexity bound of $O(1 / ε^{2/(1 - p)})$ to bring said measure below $ε> 0$, where $0 \leq p < 1$ controls the growth of model Hessians. The latter must not diverge faster than $|\mathcal{S}_k|^p$, where $\mathcal{S}_k$ is the set of successful iterations up to iteration $k$. When $p = 1$, we establish the tight exponential complexity bound $O(\exp(c ε^{-2}))$ where $c > 0$ is a constant. We describe our Julia implementation and report numerical experience on a classic basis-pursuit problem, an image denoising problem, a minimum-rank matrix completion problem, a nonlinear support vector machine and an inverse nonlinear problem.

A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

TL;DR

The paper introduces R2N, a proximal modified quasi-Newton method for nonsmooth, potentially nonconvex regularized optimization, and its diagonal variant R2DH. It establishes global convergence without requiring Lipschitz continuity of the gradient and permits unbounded model Hessians under a mild growth condition, delivering tight worst-case complexity bounds that extend to -dependent Hessian growth with . Non-monotone variants and diagonal Hessian specializations enhance practical performance, with detailed complexity analysis and robust subproblem strategies. Comprehensive numerical experiments across basis-pursuit denoise, matrix completion, nonlinear SVM, image denoising, and an inverse nonlinear problem demonstrate competitive performance, especially for diagonal and subproblem-solver configurations, and validate the method’s applicability to a broad class of nonsmooth, nonconvex problems.

Abstract

We develop R2N, a modified quasi-Newton method for minimizing the sum of a function and a lower semi-continuous prox-bounded . Both and may be nonconvex. At each iteration, our method computes a step by minimizing the sum of a quadratic model of , a model of , and an adaptive quadratic regularization term. A step may be computed by a variant of the proximal-gradient method. An advantage of R2N over trust-region (TR) methods is that proximal operators do not involve an extra TR indicator. We also develop the variant R2DH, in which the model Hessian is diagonal, which allows us to compute a step without relying on a subproblem solver when is separable. R2DH can be used as standalone solver, but also as subproblem solver inside R2N. We describe non-monotone variants of both R2N and R2DH. Global convergence of a first-order stationarity measure to zero holds without relying on local Lipschitz continuity of , while allowing model Hessians to grow unbounded, an assumption particularly relevant to quasi-Newton models. Under Lipschitz-continuity of , we establish a tight worst-case complexity bound of to bring said measure below , where controls the growth of model Hessians. The latter must not diverge faster than , where is the set of successful iterations up to iteration . When , we establish the tight exponential complexity bound where is a constant. We describe our Julia implementation and report numerical experience on a classic basis-pursuit problem, an image denoising problem, a minimum-rank matrix completion problem, a nonlinear support vector machine and an inverse nonlinear problem.
Paper Structure (15 sections, 21 theorems, 95 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 15 sections, 21 theorems, 95 equations, 4 figures, 7 tables, 1 algorithm.

Key Result

Lemma 1

\newlabellem:domains0 Let asm:psi be satisfied and $B(x) = B(x)^T$ for all $x \in \mathds{R}^n$. Then, $\mathop{\mathrm{dom}} p = \mathds{R}^n \times \mathds{R}$. In addition, if asm:unif-prox-bounded holds, $\mathop{\mathrm{dom}} P \supseteq \{ (x, \sigma) \mid \sigma > \max(\lambda^{-1} -\lambda

Figures (4)

  • Figure 1: BPDN objective vs. iterations (left) and CPU time (right).
  • Figure 2: Objectives vs. iterations for MC with rank (left) and nuclear norm (right) regularizers.
  • Figure 3: Plots of the objective vs. iterations related to SVM (left) and denoise (right).
  • Figure 4: FH objective vs. iterations.

Theorems & Definitions (41)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 1
  • Proof 1
  • Lemma 2
  • Proof 2
  • Proposition 3
  • Proof 3
  • Lemma 4
  • ...and 31 more