Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles

Saeed Masiha; Saber Salehkaleybar; Niao He; Negar Kiyavash; Patrick Thiran

Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles

Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran

TL;DR

This work analyzes the minimax oracle complexity of stochastic first-order methods for constrained optimization where the objective F is (α,τ,𝒳)-projected-gradient-dominated. It derives a tight Ω(ε^{−2/α}) lower bound for non-convex, L-smooth functions and shows an matching O(ε^{−2/α}) upper bound via a projected variance-reduced method (Proj-STORM), with Proj-SGD offering a complementary rate of O(ε^{−(4/α)+1}). In the convex case with local gradient-dominance, a lower bound Ω( G^{2} τ^{2/α} log( (2αR) / ((α−1) ε^{(α−1)/α} τ^{1/α}) ) / ε^{2/α} ) is established, matched by accelerated stochastic subgradient methods up to logarithmic factors. The results tightly characterize the dependence on α, τ, and stochastic-noise levels, and demonstrate that projection and variance-reduction techniques achieve optimal convergence within this gradient-dominance framework. Practical impact lies in guiding design of efficient algorithms for constrained, non-convex and convex stochastic optimization with projected-gradient-dominance structure.

Abstract

This work investigates the performance limits of projected stochastic first-order methods for minimizing functions under the $(α,τ,\mathcal{X})$-projected-gradient-dominance property, that asserts the sub-optimality gap $F(\mathbf{x})-\min_{\mathbf{x}'\in \mathcal{X}}F(\mathbf{x}')$ is upper-bounded by $τ\cdot\|\mathcal{G}_{η,\mathcal{X}}(\mathbf{x})\|^α$ for some $α\in[1,2)$ and $τ>0$ and $\mathcal{G}_{η,\mathcal{X}}(\mathbf{x})$ is the projected-gradient mapping with $η>0$ as a parameter. For non-convex functions, we show that the complexity lower bound of querying a batch smooth first-order stochastic oracle to obtain an $ε$-global-optimum point is $Ω(ε^{-{2}/α})$. Furthermore, we show that a projected variance-reduced first-order algorithm can obtain the upper complexity bound of $\mathcal{O}(ε^{-{2}/α})$, matching the lower bound. For convex functions, we establish a complexity lower bound of $Ω(\log(1/ε)\cdotε^{-{2}/α})$ for minimizing functions under a local version of gradient-dominance property, which also matches the upper complexity bound of accelerated stochastic subgradient methods.

Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles

TL;DR

Abstract

This work investigates the performance limits of projected stochastic first-order methods for minimizing functions under the

-projected-gradient-dominance property, that asserts the sub-optimality gap

is upper-bounded by

for some

and

is the projected-gradient mapping with

as a parameter. For non-convex functions, we show that the complexity lower bound of querying a batch smooth first-order stochastic oracle to obtain an

-global-optimum point is

. Furthermore, we show that a projected variance-reduced first-order algorithm can obtain the upper complexity bound of

, matching the lower bound. For convex functions, we establish a complexity lower bound of

for minimizing functions under a local version of gradient-dominance property, which also matches the upper complexity bound of accelerated stochastic subgradient methods.

Paper Structure (30 sections, 20 theorems, 162 equations, 1 table, 3 algorithms)

This paper contains 30 sections, 20 theorems, 162 equations, 1 table, 3 algorithms.

Introduction
Contributions
Related work
Notations
Projected-gradient-dominated functions
Lower bound for stochastic non-convex first-order optimization
Problem setting
Complexity lower bound
Upper bound for stochastic non-convex first-order optimization
Proj-SGD
Proj-STORM
Lower bound for stochastic convex first-order optimization
Setup
Complexity lower bound
Conclusion
...and 15 more sections

Key Result

Lemma 1

Consider a closed set $\mathcal{X}\subseteq \mathbb{R}^{d}$ and a $L$-smooth function $F:\mathbb{R}^{d}\to \mathbb{R}$. Let $\mathcal{M}_{F}$ be the set of global minimizers of $F$ that lie in $\mathcal{X}$ and assume that $\mathcal{M}_{F}$ is a nonempty set. Assume that for every ${\bf x}\in\mathca where $R_{0}(\alpha)=\frac{\alpha}{\alpha-1}\cdot (2L)^{\frac{\alpha-1}{2-\alpha}}\tau^{\frac{1}{2-

Theorems & Definitions (42)

Remark 1
Lemma 1
Remark 2
Theorem 1
Remark 3
proof : Proof of Theorem \ref{['lower_bound_non-convex_PL_L-avg-smooth']}
Lemma 2
Lemma 3
Remark 4
Theorem 2
...and 32 more

Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles

TL;DR

Abstract

Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (42)