Table of Contents
Fetching ...

Adaptive Extrapolated Proximal Gradient Methods with Variance Reduction for Composite Nonconvex Finite-Sum Minimization

Ganzhao Yuan

TL;DR

This work introduces AEPG-SPIDER, an Adaptive Extrapolated Proximal Gradient method with SPIDER variance reduction for composite nonconvex finite-sum minimization, and its full-batch reduction, AEPG. It achieves optimal iteration complexity without assuming Lipschitz continuity, with AEPG reaching O(N ε^{-2}) and AEPG-SPIDER reaching O(N + √N ε^{-2}) to find ε-approximate stationary points, and provides nonergodic convergence rates under the KL framework. The methods fuse adaptive stepsizes, Nesterov extrapolation, and SPIDER variance reduction, and are shown to enjoy convergence guarantees and practical advantages over existing VR and proximal-gradient approaches. Empirical results on sparse phase retrieval and linear eigenvalue problems illustrate the superior performance and robustness of AEPG-SPIDER and AEPG, especially when leveraging extrapolation (θ near 1).

Abstract

This paper proposes {\sf AEPG-SPIDER}, an Adaptive Extrapolated Proximal Gradient (AEPG) method with variance reduction for minimizing composite nonconvex finite-sum functions. It integrates three acceleration techniques: adaptive stepsizes, Nesterov's extrapolation, and the recursive stochastic path-integrated estimator SPIDER. Unlike existing methods that adjust the stepsize factor using historical gradients, {\sf AEPG-SPIDER} relies on past iterate differences for its update. While targeting stochastic finite-sum problems, {\sf AEPG-SPIDER} simplifies to {\sf AEPG} in the full-batch, non-stochastic setting, which is also of independent interest. To our knowledge, {\sf AEPG-SPIDER} and {\sf AEPG} are the first Lipschitz-free methods to achieve optimal iteration complexity for this class of \textit{composite} minimization problems. Specifically, {\sf AEPG} achieves the optimal iteration complexity of $\mathcal{O}(N ε^{-2})$, while {\sf AEPG-SPIDER} achieves $\mathcal{O}(N + \sqrt{N} ε^{-2})$ for finding $ε$-approximate stationary points, where $N$ is the number of component functions. Under the Kurdyka-Lojasiewicz (KL) assumption, we establish non-ergodic convergence rates for both methods. Preliminary experiments on sparse phase retrieval and linear eigenvalue problems demonstrate the superior performance of {\sf AEPG-SPIDER} and {\sf AEPG} compared to existing methods.

Adaptive Extrapolated Proximal Gradient Methods with Variance Reduction for Composite Nonconvex Finite-Sum Minimization

TL;DR

This work introduces AEPG-SPIDER, an Adaptive Extrapolated Proximal Gradient method with SPIDER variance reduction for composite nonconvex finite-sum minimization, and its full-batch reduction, AEPG. It achieves optimal iteration complexity without assuming Lipschitz continuity, with AEPG reaching O(N ε^{-2}) and AEPG-SPIDER reaching O(N + √N ε^{-2}) to find ε-approximate stationary points, and provides nonergodic convergence rates under the KL framework. The methods fuse adaptive stepsizes, Nesterov extrapolation, and SPIDER variance reduction, and are shown to enjoy convergence guarantees and practical advantages over existing VR and proximal-gradient approaches. Empirical results on sparse phase retrieval and linear eigenvalue problems illustrate the superior performance and robustness of AEPG-SPIDER and AEPG, especially when leveraging extrapolation (θ near 1).

Abstract

This paper proposes {\sf AEPG-SPIDER}, an Adaptive Extrapolated Proximal Gradient (AEPG) method with variance reduction for minimizing composite nonconvex finite-sum functions. It integrates three acceleration techniques: adaptive stepsizes, Nesterov's extrapolation, and the recursive stochastic path-integrated estimator SPIDER. Unlike existing methods that adjust the stepsize factor using historical gradients, {\sf AEPG-SPIDER} relies on past iterate differences for its update. While targeting stochastic finite-sum problems, {\sf AEPG-SPIDER} simplifies to {\sf AEPG} in the full-batch, non-stochastic setting, which is also of independent interest. To our knowledge, {\sf AEPG-SPIDER} and {\sf AEPG} are the first Lipschitz-free methods to achieve optimal iteration complexity for this class of \textit{composite} minimization problems. Specifically, {\sf AEPG} achieves the optimal iteration complexity of , while {\sf AEPG-SPIDER} achieves for finding -approximate stationary points, where is the number of component functions. Under the Kurdyka-Lojasiewicz (KL) assumption, we establish non-ergodic convergence rates for both methods. Preliminary experiments on sparse phase retrieval and linear eigenvalue problems demonstrate the superior performance of {\sf AEPG-SPIDER} and {\sf AEPG} compared to existing methods.

Paper Structure

This paper contains 35 sections, 29 theorems, 120 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Lemma 3.4

(Proof in Section app:lemma:smmable, Boundedness of $\mathcal{Z}_t$ and $\mathcal{V}_t$) We have the following results for all $t\geq 0$:

Figures (10)

  • Figure 1: The convergence curve for sparse phase retrieval with $\dot{\lambda}=0.01$.
  • Figure 2: The convergence curve for linear eigenvalue problems with $\dot{r}=20$.
  • Figure 3: The convergence curve for sparse phase retrieval with $\dot{\lambda}=0.01$.
  • Figure 4: The convergence curve for sparse phase retrieval with $\dot{\lambda}=0.001$.
  • Figure 5: The convergence curve for sparse phase retrieval with $\dot{\lambda}=0.01$.
  • ...and 5 more figures

Theorems & Definitions (64)

  • Remark 2.2
  • Remark 3.3
  • Lemma 3.4
  • Theorem 3.5
  • Remark 3.6
  • Lemma 3.7
  • Theorem 3.8
  • Remark 3.9
  • Lemma 4.2
  • Remark 4.3
  • ...and 54 more