Table of Contents
Fetching ...

Proximal Dogleg Opportunistic Majorization for Nonconvex and Nonsmooth Optimization

Yiming Zhou, Wei Dai

TL;DR

A fast and user-friendly second-order proximal algorithm that not only achieves a faster convergence but also tends to converge to a better local optimum compare to benchmark algorithms.

Abstract

We consider minimizing a function consisting of a quadratic term and a proximable term which is possibly nonconvex and nonsmooth. This problem is also known as scaled proximal operator. Despite its simple form, existing methods suffer from slow convergence or high implementation complexity or both. To overcome these limitations, we develop a fast and user-friendly second-order proximal algorithm. Key innovation involves building and solving a series of opportunistically majorized problems along a hybrid Newton direction. The approach directly uses the precise Hessian of the quadratic term, and calculates the inverse only once, eliminating the iterative numerical approximation of the Hessian, a common practice in quasi-Newton methods. The algorithm's convergence to a critical point is established, and local convergence rate is derived based on the Kurdyka-Lojasiewicz property of the objective function. Numerical comparisons are conducted on well-known optimization problems. The results demonstrate that the proposed algorithm not only achieves a faster convergence but also tends to converge to a better local optimum compare to benchmark algorithms.

Proximal Dogleg Opportunistic Majorization for Nonconvex and Nonsmooth Optimization

TL;DR

A fast and user-friendly second-order proximal algorithm that not only achieves a faster convergence but also tends to converge to a better local optimum compare to benchmark algorithms.

Abstract

We consider minimizing a function consisting of a quadratic term and a proximable term which is possibly nonconvex and nonsmooth. This problem is also known as scaled proximal operator. Despite its simple form, existing methods suffer from slow convergence or high implementation complexity or both. To overcome these limitations, we develop a fast and user-friendly second-order proximal algorithm. Key innovation involves building and solving a series of opportunistically majorized problems along a hybrid Newton direction. The approach directly uses the precise Hessian of the quadratic term, and calculates the inverse only once, eliminating the iterative numerical approximation of the Hessian, a common practice in quasi-Newton methods. The algorithm's convergence to a critical point is established, and local convergence rate is derived based on the Kurdyka-Lojasiewicz property of the objective function. Numerical comparisons are conducted on well-known optimization problems. The results demonstrate that the proposed algorithm not only achieves a faster convergence but also tends to converge to a better local optimum compare to benchmark algorithms.
Paper Structure (24 sections, 15 theorems, 53 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 15 theorems, 53 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

The equality in the following equation, is satisfied when $\tau = 1/\lambda_{\max}$, where $\lambda_{\max}$ denotes the largest eigenvalue of $\bm{Q}$. For any other $\tau \in (0,1/\lambda_{\max})$, the strict inequality holds.

Figures (5)

  • Figure 1: The configurations of global majorization ($\tau_{-}$) and opportunistic majorization ($\tau_{+}$) surrogate functions.
  • Figure 2: Convergence behavior of subdifferential for the first four instances (one realization) from Table \ref{['tab:1']}.
  • Figure 3: Phase transition curve of $\ell_{0}$ sparse recovery at varying sparsities. Realizations with random initialization are considered successful if NRE $<10^{-4}$ for noiseless case or $< 10^{-2}$ for noisy case.
  • Figure 4: Convergence behavior of the RPCA problem with $m = 100$. Left: Performance comparisons of subdifferential. Right: Performance comparisons of normalized recovery error of the low-rank matrix
  • Figure 5: Phase transition curve of RPCA at varying ranks. Realizations with random initialization are considered successful if $\left\|\hat{\bm{L}}-\bm{L}^\star\right\|_F /\left\|\bm{L}^\star\right\|_F<10^{-3}$.

Theorems & Definitions (20)

  • Definition 1
  • Definition 2
  • Definition 3: Subdifferential rockafellar2009variational
  • Definition 4: Kurdyka-Łojasiewicz property attouch2010proximal
  • Definition 5: Łojasiewicz exponent yu2022kurdyka
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • ...and 10 more