SPIRAL: A superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

Pourya Behmandpoor; Puya Latafat; Andreas Themelis; Marc Moonen; Panagiotis Patrinos

SPIRAL: A superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

Pourya Behmandpoor, Puya Latafat, Andreas Themelis, Marc Moonen, Panagiotis Patrinos

TL;DR

SPIRAL targets regularized finite-sum problems with nonconvex and nonsmooth components, relaxing Lipschitz-gradient requirements through relative smoothness and a Bregman framework. It blends incremental gradient updates with a linesearch driven by a Lyapunov function and uses quasi-Newton directions derived from a residual mapping to achieve fast convergence. The authors prove global and subsequential convergence, establish conditions for superlinear convergence, and show linear convergence under KL assumptions, while offering an adaptive variant (adaSPIRAL) that tunes local smoothness. Empirically, SPIRAL and adaSPIRAL outperform several state-of-the-art incremental methods on convex tasks and several nonconvex problems, including sparse phase retrieval and NN-PCA, while keeping low memory overhead. These results indicate practical applicability to large-scale, non-Lipschitz finite-sum problems in ML, signal processing, and related fields, with potential extensions to distributed settings.

Abstract

We introduce SPIRAL, a SuPerlinearly convergent Incremental pRoximal ALgorithm, for solving nonconvex regularized finite sum problems under a relative smoothness assumption. Each iteration of SPIRAL consists of an inner and an outer loop. It combines incremental gradient updates with a linesearch that has the remarkable property of never being triggered asymptotically, leading to superlinear convergence under mild assumptions at the limit point. Simulation results with L-BFGS directions on different convex, nonconvex, and non-Lipschitz differentiable problems show that our algorithm, as well as its adaptive variant, are competitive to the state of the art.

SPIRAL: A superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

TL;DR

Abstract

Paper Structure (24 sections, 8 theorems, 60 equations, 6 figures, 1 table, 3 algorithms)

This paper contains 24 sections, 8 theorems, 60 equations, 6 figures, 1 table, 3 algorithms.

Introduction
Preliminaries
Notation
Relative smoothness
Proposed algorithm
Convergence Analysis
Problem Reformulation
Lifted Representation of the Algorithm
Global and Subsequential Convergence
Superlinear Convergence
Sequential and Linear Convergence
Numerical Experiments
Adaptive variant
Sparse Phase Retrieval with Squared Loss
l1 Regularized Least Squares Problem
...and 9 more sections

Key Result

Corollary 4.3

Let ass:basic hold and let $\gamma_i \in (0,\tfrac{N}{L_i})$. Then, with the Bregman Moreau operator and the envelope associated with $\varphi$ in eq:problem_formulation given by it holds that $\operatorname{t}_{\hat{h}} = \mathop{\mathrm{t}}\nolimits \circ\nabla \hat{h} = \mathop{\mathrm{prox}}\nolimits_{\varphi}^{\hat{h}}$, and $\Phi^{\hat{H}}(\bm z) = \varphi^{\hat{h}}(z)$ for $\bm z = (z, \ld

Figures (6)

Figure 1: Performance for the phase retrieval problem \ref{['eq:phase_retrieval']} on a digit 6 image with $N=1280$, $n=256$. Image recovery is after $100$ epochs, including the original image (left), initialization (center), and output (right).
Figure 2: Performance of different algorithms for the Lasso problem \ref{['eq:lasso']}. Synthetic dataset (top left) with $N=10000$, $n=400$, synthetic dataset (top center) with $N=300$, $n=600$, mg (top right) with $N=1385$, $n=6$, triazines (bottom left) with $N=186$, $n=60$, housing (bottom center) with $N=506$, $n=13$, and cadata (bottom right) with $N=20640$, $n=8$.
Figure 3: Performance of different algorithms for the NN-PCA problem of \ref{['eq:nnpca']}. MNIST (left) with $N=60000$, $n=784$, covtype (left center) with $N=581012$, $n=54$, a9a (right center) with $N=32561$, $n=123$, and aloi (right) with $N=108000$, $n=128$.
Figure 4: Performance of different algorithms versus cpu time on the phase retrieval problem \ref{['eq:phase_retrieval']} for $550$ epochs on a digit 6 image with $N=1280$, $n=256$.
Figure 5: Performance of different algorithms versus cpu time on the lasso problem of \ref{['eq:lasso']} for $50$ epochs. Synthetic dataset (top left) with $N=10000$, $n=400$, synthetic dataset (top center) with $N=300$, $n=600$, mg (top right) with $N=1385$, $n=6$, triazines (bottom left) with $N=186$, $n=60$, housing (bottom center) with $N=506$, $n=13$, and cadata (bottom right) with $N=20640$, $n=8$.
...and 1 more figures

Theorems & Definitions (25)

Definition 2.1: Distance-generating function (dgf)
Definition 2.2: Bregman distance
Definition 2.3: Relative smoothness bolte2018first
Remark 3.1: sweeping rule in the incremental loop
Corollary 4.3: lower-dimensional representations
Proposition 4.4
Remark 4.5: Well definedness of linesearch
Remark 4.6: shuffled cyclic (randomized without replacement) sweeping rule
Theorem 4.7
proof
...and 15 more

SPIRAL: A superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

TL;DR

Abstract

SPIRAL: A superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (25)