Improving Adaptive Online Learning Using Refined Discretization

Zhiyu Zhang; Heng Yang; Ashok Cutkosky; Ioannis Ch. Paschalidis

Improving Adaptive Online Learning Using Refined Discretization

Zhiyu Zhang, Heng Yang, Ashok Cutkosky, Ioannis Ch. Paschalidis

TL;DR

This work targets unconstrained Online Linear Optimization with Lipschitz losses and achieves simultaneous gradient and comparator adaptivity without the doubling trick or prior Lipschitz knowledge. By first solving a continuous-time analogue and then introducing a refined discretization that preserves gradient adaptivity in discrete time, the authors obtain a near-optimal regret bound that scales as $O(\, ext{sqrt}(V_T))$ in the gradient variance and matches leading constants up to a small factor. The proposed method features a 1D base algorithm with a novel potential function and a two-level meta-structure, enabling scale-free operation and robustness to unknown $G$. An extension handles unknown Lipschitz constants without hints, avoiding the range-ratio problem and maintaining strong regret guarantees. The results advance the theory and practice of simultaneous adaptivity in online learning with potential applicability to other model-based and adversarial settings.

Abstract

We study unconstrained Online Linear Optimization with Lipschitz losses. Motivated by the pursuit of instance optimality, we propose a new algorithm that simultaneously achieves ($i$) the AdaGrad-style second order gradient adaptivity; and ($ii$) the comparator norm adaptivity also known as "parameter freeness" in the literature. In particular, - our algorithm does not employ the impractical doubling trick, and does not require an a priori estimate of the time-uniform Lipschitz constant; - the associated regret bound has the optimal $O(\sqrt{V_T})$ dependence on the gradient variance $V_T$, without the typical logarithmic multiplicative factor; - the leading constant in the regret bound is "almost" optimal. Central to these results is a continuous time approach to online learning. We first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.

Improving Adaptive Online Learning Using Refined Discretization

TL;DR

in the gradient variance and matches leading constants up to a small factor. The proposed method features a 1D base algorithm with a novel potential function and a two-level meta-structure, enabling scale-free operation and robustness to unknown

. An extension handles unknown Lipschitz constants without hints, avoiding the range-ratio problem and maintaining strong regret guarantees. The results advance the theory and practice of simultaneous adaptivity in online learning with potential applicability to other model-based and adversarial settings.

Abstract

We study unconstrained Online Linear Optimization with Lipschitz losses. Motivated by the pursuit of instance optimality, we propose a new algorithm that simultaneously achieves (

) the AdaGrad-style second order gradient adaptivity; and (

) the comparator norm adaptivity also known as "parameter freeness" in the literature. In particular, - our algorithm does not employ the impractical doubling trick, and does not require an a priori estimate of the time-uniform Lipschitz constant; - the associated regret bound has the optimal

dependence on the gradient variance

, without the typical logarithmic multiplicative factor; - the leading constant in the regret bound is "almost" optimal. Central to these results is a continuous time approach to online learning. We first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.

Paper Structure (27 sections, 15 theorems, 68 equations)

This paper contains 27 sections, 15 theorems, 68 equations.

Introduction
Contribution
Quantitative side
Technical side
Notation
Related work
Simultaneous adaptivity
Continuous time approach
Warm up: Adaptivity in continuous time
Setting
Analysis
Main result: Refined discretization
Algorithm
Analysis
Proof sketch of Lemma \ref{['lemma:one_step']}
...and 12 more sections

Key Result

Theorem 1

If $\phi\in C^{1,2}(\mathcal{X})$ satisfies the Backward Heat Equation (BHE) $\partial_1\phi+\frac{1}{2}\partial_{22}\phi=0$, then for all $T\in\mathbb{R}_{\geq 0}$ and $u\in\mathbb{R}$, almost surely, Here we follow the notation from Section subsection:notation: $\phi^*_{\cdot}(\cdot)$ is the Fenchel conjugate of $\phi$ with respect to its second argument.

Theorems & Definitions (25)

Theorem 1
Lemma 3.1: Itô's formula
proof : Proof of Theorem \ref{['theorem:continuous']}
Lemma 4.1: Well-posedness
Lemma 4.2: Key lemma: one step potential bound
Theorem 2: Main result
Lemma 5.1
Lemma B.1: Convexity
proof : Proof of Lemma \ref{['lemma:convexity']}
Lemma B.2: The sign of prediction
...and 15 more

Improving Adaptive Online Learning Using Refined Discretization

TL;DR

Abstract

Improving Adaptive Online Learning Using Refined Discretization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (25)