Improving Adaptive Online Learning Using Refined Discretization
Zhiyu Zhang, Heng Yang, Ashok Cutkosky, Ioannis Ch. Paschalidis
TL;DR
This work targets unconstrained Online Linear Optimization with Lipschitz losses and achieves simultaneous gradient and comparator adaptivity without the doubling trick or prior Lipschitz knowledge. By first solving a continuous-time analogue and then introducing a refined discretization that preserves gradient adaptivity in discrete time, the authors obtain a near-optimal regret bound that scales as $O(\, ext{sqrt}(V_T))$ in the gradient variance and matches leading constants up to a small factor. The proposed method features a 1D base algorithm with a novel potential function and a two-level meta-structure, enabling scale-free operation and robustness to unknown $G$. An extension handles unknown Lipschitz constants without hints, avoiding the range-ratio problem and maintaining strong regret guarantees. The results advance the theory and practice of simultaneous adaptivity in online learning with potential applicability to other model-based and adversarial settings.
Abstract
We study unconstrained Online Linear Optimization with Lipschitz losses. Motivated by the pursuit of instance optimality, we propose a new algorithm that simultaneously achieves ($i$) the AdaGrad-style second order gradient adaptivity; and ($ii$) the comparator norm adaptivity also known as "parameter freeness" in the literature. In particular, - our algorithm does not employ the impractical doubling trick, and does not require an a priori estimate of the time-uniform Lipschitz constant; - the associated regret bound has the optimal $O(\sqrt{V_T})$ dependence on the gradient variance $V_T$, without the typical logarithmic multiplicative factor; - the leading constant in the regret bound is "almost" optimal. Central to these results is a continuous time approach to online learning. We first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.
