Power of Generalized Smoothness in Stochastic Convex Optimization: First- and Zero-Order Algorithms
Aleksandr Lobanov, Alexander Gasnikov
TL;DR
The paper addresses stochastic convex optimization under generalized $(L_0,L_1)$-smoothness, developing first-order methods with clipping (ClipSGD) and normalization (NSGD) and extending the analysis to zero-order algorithms (ZO-ClipSGD, ZO-NSGD). By deriving convergence-summing terms that capture linear-rate behavior, it establishes iteration complexities such as $N = \tilde{O}(L_1 R \log(1/\\varepsilon) + L_1 c R^2/\\varepsilon)$ for ClipSGD and $N = \tilde{O}(L_1 R \log(1/\\varepsilon))$ for NSGD in the $L_0=0$ regime, with NSGD requiring a large batch $B$ in general. The generalized smoothness framework is shown to extend to zero-order methods, yielding analogous linear-rate summands at the cost of additional oracle calls and noise tolerance, supported by logistic-regression experiments that confirm practical linear convergence. Overall, the work demonstrates that generalized smoothness can enable linear convergence in stochastic convex optimization and opens new directions for biased-gradient and zero-order approaches.
Abstract
This paper is devoted to the study of stochastic optimization problems under the generalized smoothness assumption. By considering the unbiased gradient oracle in Stochastic Gradient Descent, we provide strategies to achieve in bounds the summands describing linear rate. In particular, in the case $L_0 = 0$, we obtain in the convex setup the iteration complexity: $N = \mathcal{O}\left(L_1R \log\frac{1}{\varepsilon} + \frac{L_1 c R^2}{\varepsilon}\right)$ for Clipped Stochastic Gradient Descent and $N = \mathcal{O}\left(L_1R \log\frac{1}{\varepsilon}\right)$ for Normalized Stochastic Gradient Descent. Furthermore, we generalize the convergence results to the case with a biased gradient oracle, and show that the power of $(L_0,L_1)$-smoothness extends to zero-order algorithms. Finally, we demonstrate the possibility of linear convergence in the convex setup through numerical experimentation, which has aroused some interest in the machine learning community.
