Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Daniil Vankov; Anton Rodomanov; Angelia Nedich; Lalitha Sankar; Sebastian U. Stich

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Daniil Vankov, Anton Rodomanov, Angelia Nedich, Lalitha Sankar, Sebastian U. Stich

TL;DR

This work analyzes gradient-based optimization for $(L_0,L_1)$-smooth functions, a broad generalization of Lipschitz-smooth models relevant to modern learning. By deriving a tighter first-order characterization and principled gradient steps, the authors connect standard GM, normalized GM, and Polyak-step GM to sharper complexity bounds. In the convex setting, they significantly improve guarantees to $O\left(\frac{L_0 R^2}{\epsilon} + L_1 R \ln \frac{F_0}{\epsilon}\right)$, with adaptive methods that do not require explicit $(L_0,L_1)$ knowledge, and they introduce AGMsDR, achieving a fast rate $\nu\mathcal{O}\left(\sqrt{\frac{L_0 R^2}{\epsilon}} + \lceil(L_1 R)^{2/3}\rceil \lceil\ln \frac{F_0}{\epsilon}\rceil\right)$. The results unify nonconvex and convex analyses, provide improved worst-case bounds for several variants, and demonstrate practical gains via numerical experiments. Overall, the paper advances theory and practice for optimization under $(L_0,L_1)$-smoothness, offering adaptive, accelerated methods with strong guarantees across problem regimes.

Abstract

We study gradient methods for optimizing $(L_0, L_1)$-smooth functions, a class that generalizes Lipschitz-smooth functions and has gained attention for its relevance in machine learning. We provide new insights into the structure of this function class and develop a principled framework for analyzing optimization methods in this setting. While our convergence rate estimates recover existing results for minimizing the gradient norm in nonconvex problems, our approach significantly improves the best-known complexity bounds for convex objectives. Moreover, we show that the gradient method with Polyak stepsizes and the normalized gradient method achieve nearly the same complexity guarantees as methods that rely on explicit knowledge of~$(L_0, L_1)$. Finally, we demonstrate that a carefully designed accelerated gradient method can be applied to $(L_0, L_1)$-smooth functions, further improving all previous results.

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

TL;DR

This work analyzes gradient-based optimization for

-smooth functions, a broad generalization of Lipschitz-smooth models relevant to modern learning. By deriving a tighter first-order characterization and principled gradient steps, the authors connect standard GM, normalized GM, and Polyak-step GM to sharper complexity bounds. In the convex setting, they significantly improve guarantees to

, with adaptive methods that do not require explicit

knowledge, and they introduce AGMsDR, achieving a fast rate

. The results unify nonconvex and convex analyses, provide improved worst-case bounds for several variants, and demonstrate practical gains via numerical experiments. Overall, the paper advances theory and practice for optimization under

-smoothness, offering adaptive, accelerated methods with strong guarantees across problem regimes.

Abstract

We study gradient methods for optimizing

-smooth functions, a class that generalizes Lipschitz-smooth functions and has gained attention for its relevance in machine learning. We provide new insights into the structure of this function class and develop a principled framework for analyzing optimization methods in this setting. While our convergence rate estimates recover existing results for minimizing the gradient norm in nonconvex problems, our approach significantly improves the best-known complexity bounds for convex objectives. Moreover, we show that the gradient method with Polyak stepsizes and the normalized gradient method achieve nearly the same complexity guarantees as methods that rely on explicit knowledge of~

. Finally, we demonstrate that a carefully designed accelerated gradient method can be applied to

-smooth functions, further improving all previous results.

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

TL;DR

Abstract

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (45)