Table of Contents
Fetching ...

SPIRAL: A superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

Pourya Behmandpoor, Puya Latafat, Andreas Themelis, Marc Moonen, Panagiotis Patrinos

TL;DR

SPIRAL targets regularized finite-sum problems with nonconvex and nonsmooth components, relaxing Lipschitz-gradient requirements through relative smoothness and a Bregman framework. It blends incremental gradient updates with a linesearch driven by a Lyapunov function and uses quasi-Newton directions derived from a residual mapping to achieve fast convergence. The authors prove global and subsequential convergence, establish conditions for superlinear convergence, and show linear convergence under KL assumptions, while offering an adaptive variant (adaSPIRAL) that tunes local smoothness. Empirically, SPIRAL and adaSPIRAL outperform several state-of-the-art incremental methods on convex tasks and several nonconvex problems, including sparse phase retrieval and NN-PCA, while keeping low memory overhead. These results indicate practical applicability to large-scale, non-Lipschitz finite-sum problems in ML, signal processing, and related fields, with potential extensions to distributed settings.

Abstract

We introduce SPIRAL, a SuPerlinearly convergent Incremental pRoximal ALgorithm, for solving nonconvex regularized finite sum problems under a relative smoothness assumption. Each iteration of SPIRAL consists of an inner and an outer loop. It combines incremental gradient updates with a linesearch that has the remarkable property of never being triggered asymptotically, leading to superlinear convergence under mild assumptions at the limit point. Simulation results with L-BFGS directions on different convex, nonconvex, and non-Lipschitz differentiable problems show that our algorithm, as well as its adaptive variant, are competitive to the state of the art.

SPIRAL: A superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

TL;DR

SPIRAL targets regularized finite-sum problems with nonconvex and nonsmooth components, relaxing Lipschitz-gradient requirements through relative smoothness and a Bregman framework. It blends incremental gradient updates with a linesearch driven by a Lyapunov function and uses quasi-Newton directions derived from a residual mapping to achieve fast convergence. The authors prove global and subsequential convergence, establish conditions for superlinear convergence, and show linear convergence under KL assumptions, while offering an adaptive variant (adaSPIRAL) that tunes local smoothness. Empirically, SPIRAL and adaSPIRAL outperform several state-of-the-art incremental methods on convex tasks and several nonconvex problems, including sparse phase retrieval and NN-PCA, while keeping low memory overhead. These results indicate practical applicability to large-scale, non-Lipschitz finite-sum problems in ML, signal processing, and related fields, with potential extensions to distributed settings.

Abstract

We introduce SPIRAL, a SuPerlinearly convergent Incremental pRoximal ALgorithm, for solving nonconvex regularized finite sum problems under a relative smoothness assumption. Each iteration of SPIRAL consists of an inner and an outer loop. It combines incremental gradient updates with a linesearch that has the remarkable property of never being triggered asymptotically, leading to superlinear convergence under mild assumptions at the limit point. Simulation results with L-BFGS directions on different convex, nonconvex, and non-Lipschitz differentiable problems show that our algorithm, as well as its adaptive variant, are competitive to the state of the art.
Paper Structure (24 sections, 8 theorems, 60 equations, 6 figures, 1 table, 3 algorithms)

This paper contains 24 sections, 8 theorems, 60 equations, 6 figures, 1 table, 3 algorithms.

Key Result

Corollary 4.3

Let ass:basic hold and let $\gamma_i \in (0,\tfrac{N}{L_i})$. Then, with the Bregman Moreau operator and the envelope associated with $\varphi$ in eq:problem_formulation given by it holds that $\operatorname{t}_{\hat{h}} = \mathop{\mathrm{t}}\nolimits \circ\nabla \hat{h} = \mathop{\mathrm{prox}}\nolimits_{\varphi}^{\hat{h}}$, and $\Phi^{\hat{H}}(\bm z) = \varphi^{\hat{h}}(z)$ for $\bm z = (z, \ld

Figures (6)

  • Figure 1: Performance for the phase retrieval problem \ref{['eq:phase_retrieval']} on a digit 6 image with $N=1280$, $n=256$. Image recovery is after $100$ epochs, including the original image (left), initialization (center), and output (right).
  • Figure 2: Performance of different algorithms for the Lasso problem \ref{['eq:lasso']}. Synthetic dataset (top left) with $N=10000$, $n=400$, synthetic dataset (top center) with $N=300$, $n=600$, mg (top right) with $N=1385$, $n=6$, triazines (bottom left) with $N=186$, $n=60$, housing (bottom center) with $N=506$, $n=13$, and cadata (bottom right) with $N=20640$, $n=8$.
  • Figure 3: Performance of different algorithms for the NN-PCA problem of \ref{['eq:nnpca']}. MNIST (left) with $N=60000$, $n=784$, covtype (left center) with $N=581012$, $n=54$, a9a (right center) with $N=32561$, $n=123$, and aloi (right) with $N=108000$, $n=128$.
  • Figure 4: Performance of different algorithms versus cpu time on the phase retrieval problem \ref{['eq:phase_retrieval']} for $550$ epochs on a digit 6 image with $N=1280$, $n=256$.
  • Figure 5: Performance of different algorithms versus cpu time on the lasso problem of \ref{['eq:lasso']} for $50$ epochs. Synthetic dataset (top left) with $N=10000$, $n=400$, synthetic dataset (top center) with $N=300$, $n=600$, mg (top right) with $N=1385$, $n=6$, triazines (bottom left) with $N=186$, $n=60$, housing (bottom center) with $N=506$, $n=13$, and cadata (bottom right) with $N=20640$, $n=8$.
  • ...and 1 more figures

Theorems & Definitions (25)

  • Definition 2.1: Distance-generating function (dgf)
  • Definition 2.2: Bregman distance
  • Definition 2.3: Relative smoothness bolte2018first
  • Remark 3.1: sweeping rule in the incremental loop
  • Corollary 4.3: lower-dimensional representations
  • Proposition 4.4
  • Remark 4.5: Well definedness of linesearch
  • Remark 4.6: shuffled cyclic (randomized without replacement) sweeping rule
  • Theorem 4.7
  • proof
  • ...and 15 more