Gradient Methods with Online Scaling Part II. Practical Aspects

Ya-Chi Chu; Wenzhi Gao; Yinyu Ye; Madeleine Udell

Gradient Methods with Online Scaling Part II. Practical Aspects

Ya-Chi Chu, Wenzhi Gao, Yinyu Ye, Madeleine Udell

TL;DR

This work advances practical gradient-based optimization by turning stepsize selection into an online learning problem via the OSGM framework. It introduces OSGM-Best, a robust variant that blends hypergradient feedback with heavy-ball momentum and lookahead to rival quasi-Newton performance with lower memory and cheaper iterations. The paper extends OSGM to smooth nonconvex problems through stepsize-space regularization and demonstrates theoretical progress reductions under broad conditions, complemented by extensive numerical experiments on convex (e.g., SVM and logistic regression) and nonconvex benchmarks. The results establish OSGM-Best as a competitive addition to first-order methods, with clear pathways to further enhancements via BB steps, proximal settings, and performance-estimation-guided design. Overall, the work bridges online optimization ideas with practical first-order methods to yield adaptive, scalable, and performant algorithms for a wide range of problems.

Abstract

Part I of this work [Gao25] establishes online scaled gradient methods (OSGM), a framework that utilizes online convex optimization to adapt stepsizes in gradient methods. This paper focuses on the practical aspects of OSGM. We leverage the OSGM framework to design new adaptive first-order methods and provide insights into their empirical behavior. The resulting method, OSGM-Best, matches the performance of quasi-Newton variants while requiring less memory and cheaper iterations. We also extend OSGM to nonconvex optimization and outline directions that connect OSGM to existing branches of optimization theory and practice.

Gradient Methods with Online Scaling Part II. Practical Aspects

TL;DR

Abstract

Gradient Methods with Online Scaling Part II. Practical Aspects

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (37)