Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles
Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran
TL;DR
This work analyzes the minimax oracle complexity of stochastic first-order methods for constrained optimization where the objective F is (α,τ,𝒳)-projected-gradient-dominated. It derives a tight Ω(ε^{−2/α}) lower bound for non-convex, L-smooth functions and shows an matching O(ε^{−2/α}) upper bound via a projected variance-reduced method (Proj-STORM), with Proj-SGD offering a complementary rate of O(ε^{−(4/α)+1}). In the convex case with local gradient-dominance, a lower bound Ω( G^{2} τ^{2/α} log( (2αR) / ((α−1) ε^{(α−1)/α} τ^{1/α}) ) / ε^{2/α} ) is established, matched by accelerated stochastic subgradient methods up to logarithmic factors. The results tightly characterize the dependence on α, τ, and stochastic-noise levels, and demonstrate that projection and variance-reduction techniques achieve optimal convergence within this gradient-dominance framework. Practical impact lies in guiding design of efficient algorithms for constrained, non-convex and convex stochastic optimization with projected-gradient-dominance structure.
Abstract
This work investigates the performance limits of projected stochastic first-order methods for minimizing functions under the $(α,τ,\mathcal{X})$-projected-gradient-dominance property, that asserts the sub-optimality gap $F(\mathbf{x})-\min_{\mathbf{x}'\in \mathcal{X}}F(\mathbf{x}')$ is upper-bounded by $τ\cdot\|\mathcal{G}_{η,\mathcal{X}}(\mathbf{x})\|^α$ for some $α\in[1,2)$ and $τ>0$ and $\mathcal{G}_{η,\mathcal{X}}(\mathbf{x})$ is the projected-gradient mapping with $η>0$ as a parameter. For non-convex functions, we show that the complexity lower bound of querying a batch smooth first-order stochastic oracle to obtain an $ε$-global-optimum point is $Ω(ε^{-{2}/α})$. Furthermore, we show that a projected variance-reduced first-order algorithm can obtain the upper complexity bound of $\mathcal{O}(ε^{-{2}/α})$, matching the lower bound. For convex functions, we establish a complexity lower bound of $Ω(\log(1/ε)\cdotε^{-{2}/α})$ for minimizing functions under a local version of gradient-dominance property, which also matches the upper complexity bound of accelerated stochastic subgradient methods.
