Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer

Yan-Feng Xie; Yu-Jie Zhang; Peng Zhao; Zhi-Hua Zhou

Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer

Yan-Feng Xie, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou

TL;DR

This paper introduces a modular discounted-to-dynamic reduction for online learning with curved losses, enabling dynamic-regret guarantees by leveraging discounted regret templates. It shows that two curved losses, online linear regression and online logistic regression, admit sharp dynamic-regret bounds under the modular framework, and implements a two-layer ensemble to tune discount factors for logistic regression. Extending the reduction to the Adam optimizer via the O2NC framework yields optimal convergence rates in stochastic, non-convex, and non-smooth settings, with flexible parameter choices for $(\beta_1,\beta_2)$ under both clipped and clip-free variants. The results illuminate the role of momentum and second-moment dynamics in non-stationary environments, and provide a unified approach to analyze adaptive optimizers within non-convex online-to-online reductions. Collectively, the work advances theory and practice for dynamic adaptation in curved-loss online learning and non-convex stochastic optimization.

Abstract

We study dynamic regret minimization in non-stationary online learning, with a primary focus on follow-the-regularized-leader (FTRL) methods. FTRL is important for curved losses and for understanding adaptive optimizers such as Adam, yet existing dynamic regret analyses are less explored for FTRL. To address this, we build on the discounted-to-dynamic reduction and present a modular way to obtain dynamic regret bounds of FTRL-related problems. Specifically, we focus on two representative curved losses: linear regression and logistic regression. Our method not only simplifies existing proofs for the optimal dynamic regret of online linear regression, but also yields new dynamic regret guarantees for online logistic regression. Beyond online convex optimization, we apply the reduction to analyze the Adam optimizers, obtaining optimal convergence rates in stochastic, non-convex, and non-smooth settings. The reduction also enables a more detailed treatment of Adam with two discount parameters $(β_1,β_2)$, leading to new results for both clipped and clip-free variants of Adam optimizers.

Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer

TL;DR

under both clipped and clip-free variants. The results illuminate the role of momentum and second-moment dynamics in non-stationary environments, and provide a unified approach to analyze adaptive optimizers within non-convex online-to-online reductions. Collectively, the work advances theory and practice for dynamic adaptation in curved-loss online learning and non-convex stochastic optimization.

Abstract

, leading to new results for both clipped and clip-free variants of Adam optimizers.

Paper Structure (78 sections, 36 theorems, 272 equations, 2 algorithms)

This paper contains 78 sections, 36 theorems, 272 equations, 2 algorithms.

Introduction
Our Result I: Dynamic Regret of Curved Losses.
Our Result II: Adam Optimizers.
Techniques.
Organization.
Modular Discounted-to-Dynamic Reduction
Connection of Dynamic and Discounted Regret
Modular D2D Reduction
Dynamic Regret of Curved Losses
Online Linear Regression
Online Logistic Regression
Convergence Conditions for Adam
Setup
Understanding Adam via FTRL
Convergence Conditions for Clipped Adam
...and 63 more sections

Key Result

Lemma 1

The following statement is true for any $T > 0$ and any comparator sequence $\mathbf{u}_1, \dots, \mathbf{u}_T \in \mathcal{X} \subseteq \mathbb{R}^d$: where $\textsc{Reg}_{t;\beta}(\mathbf{u})$ is defined in Eq. eq:def-discounted-regret.

Theorems & Definitions (59)

Lemma 1: Adapted from Theorem B.3 in ICML'24:adam-ftrl-ahn
Theorem 1: Modular D2D Reduction
Lemma 2
Theorem 2
Theorem 3
Theorem 4
Definition 1
Theorem 5
Theorem 6
Theorem 7
...and 49 more

Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer

TL;DR

Abstract

Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (59)