Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization

Yufeng Yang; Erin Tripp; Yifan Sun; Shaofeng Zou; Yi Zhou

Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization

Yufeng Yang, Erin Tripp, Yifan Sun, Shaofeng Zou, Yi Zhou

TL;DR

The paper addresses the gap in optimization for generalized-smooth nonconvex problems by introducing adaptive gradient normalization and an independently sampled stochastic method. It develops AN-GD to exploit generalized-PL geometry and introduces IAN-SGD, which uses independent sampling and gradient clipping to achieve $\mathcal{O}(\epsilon^{-4})$ sample complexity under relaxed noise. Theoretical results detail descent properties and convergence rates across PL-like regimes, while experiments in phase retrieval, distributionally robust optimization, and deep nets demonstrate practical advantages and robustness. The work advances first-order methods for nonconvex generalized-smooth objectives and opens avenues for combining independence and normalization with momentum or variance reduction for improved efficiency.

Abstract

Recent studies have shown that many nonconvex machine learning problems satisfy a generalized-smooth condition that extends beyond traditional smooth nonconvex optimization. However, the existing algorithms are not fully adapted to such generalized-smooth nonconvex geometry and encounter significant technical limitations on their convergence analysis. In this work, we first analyze the convergence of adaptively normalized gradient descent under function geometries characterized by generalized-smoothness and generalized PŁ condition, revealing the advantage of adaptive gradient normalization. Our results provide theoretical insights into adaptive normalization across various scenarios.For stochastic generalized-smooth nonconvex optimization, we propose \textbf{I}ndependent-\textbf{A}daptively \textbf{N}ormalized \textbf{S}tochastic \textbf{G}radient \textbf{D}escent algorithm, which leverages adaptive gradient normalization, independent sampling, and gradient clipping to achieve an $\mathcal{O}(ε^{-4})$ sample complexity under relaxed noise assumptions. Experiments on large-scale nonconvex generalized-smooth problems demonstrate the fast convergence of our algorithm.

Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization

TL;DR

Abstract

Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (25)