Table of Contents
Fetching ...

Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles

Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran

TL;DR

This work analyzes the minimax oracle complexity of stochastic first-order methods for constrained optimization where the objective F is (α,τ,𝒳)-projected-gradient-dominated. It derives a tight Ω(ε^{−2/α}) lower bound for non-convex, L-smooth functions and shows an matching O(ε^{−2/α}) upper bound via a projected variance-reduced method (Proj-STORM), with Proj-SGD offering a complementary rate of O(ε^{−(4/α)+1}). In the convex case with local gradient-dominance, a lower bound Ω( G^{2} τ^{2/α} log( (2αR) / ((α−1) ε^{(α−1)/α} τ^{1/α}) ) / ε^{2/α} ) is established, matched by accelerated stochastic subgradient methods up to logarithmic factors. The results tightly characterize the dependence on α, τ, and stochastic-noise levels, and demonstrate that projection and variance-reduction techniques achieve optimal convergence within this gradient-dominance framework. Practical impact lies in guiding design of efficient algorithms for constrained, non-convex and convex stochastic optimization with projected-gradient-dominance structure.

Abstract

This work investigates the performance limits of projected stochastic first-order methods for minimizing functions under the $(α,τ,\mathcal{X})$-projected-gradient-dominance property, that asserts the sub-optimality gap $F(\mathbf{x})-\min_{\mathbf{x}'\in \mathcal{X}}F(\mathbf{x}')$ is upper-bounded by $τ\cdot\|\mathcal{G}_{η,\mathcal{X}}(\mathbf{x})\|^α$ for some $α\in[1,2)$ and $τ>0$ and $\mathcal{G}_{η,\mathcal{X}}(\mathbf{x})$ is the projected-gradient mapping with $η>0$ as a parameter. For non-convex functions, we show that the complexity lower bound of querying a batch smooth first-order stochastic oracle to obtain an $ε$-global-optimum point is $Ω(ε^{-{2}/α})$. Furthermore, we show that a projected variance-reduced first-order algorithm can obtain the upper complexity bound of $\mathcal{O}(ε^{-{2}/α})$, matching the lower bound. For convex functions, we establish a complexity lower bound of $Ω(\log(1/ε)\cdotε^{-{2}/α})$ for minimizing functions under a local version of gradient-dominance property, which also matches the upper complexity bound of accelerated stochastic subgradient methods.

Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles

TL;DR

This work analyzes the minimax oracle complexity of stochastic first-order methods for constrained optimization where the objective F is (α,τ,𝒳)-projected-gradient-dominated. It derives a tight Ω(ε^{−2/α}) lower bound for non-convex, L-smooth functions and shows an matching O(ε^{−2/α}) upper bound via a projected variance-reduced method (Proj-STORM), with Proj-SGD offering a complementary rate of O(ε^{−(4/α)+1}). In the convex case with local gradient-dominance, a lower bound Ω( G^{2} τ^{2/α} log( (2αR) / ((α−1) ε^{(α−1)/α} τ^{1/α}) ) / ε^{2/α} ) is established, matched by accelerated stochastic subgradient methods up to logarithmic factors. The results tightly characterize the dependence on α, τ, and stochastic-noise levels, and demonstrate that projection and variance-reduction techniques achieve optimal convergence within this gradient-dominance framework. Practical impact lies in guiding design of efficient algorithms for constrained, non-convex and convex stochastic optimization with projected-gradient-dominance structure.

Abstract

This work investigates the performance limits of projected stochastic first-order methods for minimizing functions under the -projected-gradient-dominance property, that asserts the sub-optimality gap is upper-bounded by for some and and is the projected-gradient mapping with as a parameter. For non-convex functions, we show that the complexity lower bound of querying a batch smooth first-order stochastic oracle to obtain an -global-optimum point is . Furthermore, we show that a projected variance-reduced first-order algorithm can obtain the upper complexity bound of , matching the lower bound. For convex functions, we establish a complexity lower bound of for minimizing functions under a local version of gradient-dominance property, which also matches the upper complexity bound of accelerated stochastic subgradient methods.
Paper Structure (30 sections, 20 theorems, 162 equations, 1 table, 3 algorithms)

This paper contains 30 sections, 20 theorems, 162 equations, 1 table, 3 algorithms.

Key Result

Lemma 1

Consider a closed set $\mathcal{X}\subseteq \mathbb{R}^{d}$ and a $L$-smooth function $F:\mathbb{R}^{d}\to \mathbb{R}$. Let $\mathcal{M}_{F}$ be the set of global minimizers of $F$ that lie in $\mathcal{X}$ and assume that $\mathcal{M}_{F}$ is a nonempty set. Assume that for every ${\bf x}\in\mathca where $R_{0}(\alpha)=\frac{\alpha}{\alpha-1}\cdot (2L)^{\frac{\alpha-1}{2-\alpha}}\tau^{\frac{1}{2-

Theorems & Definitions (42)

  • Remark 1
  • Lemma 1
  • Remark 2
  • Theorem 1
  • Remark 3
  • proof : Proof of Theorem \ref{['lower_bound_non-convex_PL_L-avg-smooth']}
  • Lemma 2
  • Lemma 3
  • Remark 4
  • Theorem 2
  • ...and 32 more