Table of Contents
Fetching ...

Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes

Guanghui Lan, Tianjiao Li, Yangyang Xu

TL;DR

This work develops a unified projected gradient framework for smooth, possibly nonconvex optimization over a convex compact set, introducing a parameter-free auto-conditioned PG (AC-PG) that obviates the need for knowledge of curvature constants or line searches. It establishes new iteration complexity bounds that simultaneously cover convex and nonconvex settings via the lower curvature parameter $l$ and extends the theory to stochastic settings with SPG, AC-SPG, VR-SPG, and AC-VR-SPG, achieving improved or near-optimal rates, including an $ ilde{O}(1/ε^2)$-type dependence in convex cases and $ ilde{O}(1/ε^3)$-type bounds in certain stochastic nonconvex scenarios. The paper also provides high-probability guarantees through a two-phase approach and demonstrates practical benefits through numerical experiments on box-constrained QP and semisupervised SVM problems, validating the advantages of auto-conditioning and variance reduction in large-scale, constrained optimization. Overall, the methods offer robust, adaptive tools for efficiently finding ε-stationary points without expensive line searches, with strong theoretical guarantees and practical relevance for machine learning and simulation tasks.

Abstract

We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the "vanilla" PG method, achieving the best-known iteration complexity for finding an approximate stationary point of the problem. We then develop an "auto-conditioned" projected gradient (AC-PG) variant that achieves the same iteration complexity without requiring the input of the Lipschitz constant of the gradient or any line search procedure. The key idea is to estimate the Lipschitz constant using first-order information gathered from the previous iterations, and to show that the error caused by underestimating the Lipschitz constant can be properly controlled. We then generalize the PG methods to the stochastic setting, by proposing a stochastic projected gradient (SPG) method and a variance-reduced stochastic gradient (VR-SPG) method, achieving new complexity bounds in different oracle settings. We also present auto-conditioned stepsize policies for both stochastic PG methods and establish comparable convergence guarantees.

Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes

TL;DR

This work develops a unified projected gradient framework for smooth, possibly nonconvex optimization over a convex compact set, introducing a parameter-free auto-conditioned PG (AC-PG) that obviates the need for knowledge of curvature constants or line searches. It establishes new iteration complexity bounds that simultaneously cover convex and nonconvex settings via the lower curvature parameter and extends the theory to stochastic settings with SPG, AC-SPG, VR-SPG, and AC-VR-SPG, achieving improved or near-optimal rates, including an -type dependence in convex cases and -type bounds in certain stochastic nonconvex scenarios. The paper also provides high-probability guarantees through a two-phase approach and demonstrates practical benefits through numerical experiments on box-constrained QP and semisupervised SVM problems, validating the advantages of auto-conditioning and variance reduction in large-scale, constrained optimization. Overall, the methods offer robust, adaptive tools for efficiently finding ε-stationary points without expensive line searches, with strong theoretical guarantees and practical relevance for machine learning and simulation tasks.

Abstract

We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the "vanilla" PG method, achieving the best-known iteration complexity for finding an approximate stationary point of the problem. We then develop an "auto-conditioned" projected gradient (AC-PG) variant that achieves the same iteration complexity without requiring the input of the Lipschitz constant of the gradient or any line search procedure. The key idea is to estimate the Lipschitz constant using first-order information gathered from the previous iterations, and to show that the error caused by underestimating the Lipschitz constant can be properly controlled. We then generalize the PG methods to the stochastic setting, by proposing a stochastic projected gradient (SPG) method and a variance-reduced stochastic gradient (VR-SPG) method, achieving new complexity bounds in different oracle settings. We also present auto-conditioned stepsize policies for both stochastic PG methods and establish comparable convergence guarantees.

Paper Structure

This paper contains 17 sections, 13 theorems, 178 equations, 5 figures, 7 algorithms.

Key Result

theorem 1

Let $\{x_t\}$ be generated by Algorithm alg_1_0 with $\gamma_t = \gamma \geq L, \forall\, t$. Then we have Specifically, if $\gamma_t = L, \forall\, t$, then

Figures (5)

  • Figure 1: Average and standard deviation results by the PG and AC-PG methods on solving 10 independently generated instances of box-constrained quadratic programming. Left: norm of projected gradient mapping at each iteration; Right: estimated local gradient Lipschitz constants by AC-PG starting from different initial estimates.
  • Figure 2: Average and standard deviation results by SPG, VR-SPG, AC-SPG, and AC-VR-SPG on solving 10 independently generated instances of (\ref{['eq:svm']}) of dimension $n=10$ with pre-generated data sets for $u_1$ and $u_2$.
  • Figure 3: Average and standard deviation results by SPG, VR-SPG, AC-SPG, and AC-VR-SPG on solving 10 independently generated instances of (\ref{['eq:svm']}) of dimension $n=100$ with pre-generated data sets for $u_1$ and $u_2$.
  • Figure 4: Average and standard deviation results by SPG, VR-SPG, AC-SPG, and AC-VR-SPG on solving 10 independently generated instances of (\ref{['eq:svm']}) of dimension $n=10$ in an online manner, i.e., with samples generated for $u_1$ and $u_2$ while needed.
  • Figure 5: Average and standard deviation results by SPG, VR-SPG, AC-SPG, and AC-VR-SPG on solving 10 independently generated instances of (\ref{['eq:svm']}) of dimension $n=100$ in an online manner, i.e., with samples generated for $u_1$ and $u_2$ while needed.

Theorems & Definitions (27)

  • theorem 1
  • proof
  • remark thmcounterremark
  • theorem 2
  • proof
  • remark thmcounterremark
  • theorem 3
  • proof
  • lemma thmcounterlemma
  • proof
  • ...and 17 more