Discounted Adaptive Online Learning: Towards Better Regularization
Zhiyu Zhang, David Bombara, Heng Yang
TL;DR
The paper tackles nonstationary adversarial online learning by introducing a discounted regret framework and an adaptive FTRL-based algorithm that achieves instance-optimal performance beyond constant-learning-rate baselines. It employs a rescaling trick to convert scale-free undiscounted guarantees into discounted ones and develops a two-component, simultaneous adaptivity scheme that learns both the direction and magnitude of the comparator via polar decomposition. The framework is extended to online conformal prediction (OCP), where stability-based guarantees yield improved coverage and reduced dependence on unknown horizon or maximal radius. Empirical results in OCP demonstrate favorable coverage, narrower prediction sets, and competitive runtimes compared to strong baselines. Collectively, the work strengthens the link between adaptive regularization and discounted online optimization, with practical implications for lifelong learning and robust uncertainty quantification in nonstationary environments.
Abstract
We study online learning in adversarial nonstationary environments. Since the future can be very different from the past, a critical challenge is to gracefully forget the history while new data comes in. To formalize this intuition, we revisit the discounted regret in online convex optimization, and propose an adaptive (i.e., instance optimal), FTRL-based algorithm that improves the widespread non-adaptive baseline -- gradient descent with a constant learning rate. From a practical perspective, this refines the classical idea of regularization in lifelong learning: we show that designing good regularizers can be guided by the principled theory of adaptive online optimization. Complementing this result, we also consider the (Gibbs and Candès, 2021)-style online conformal prediction problem, where the goal is to sequentially predict the uncertainty sets of a black-box machine learning model. We show that the FTRL nature of our algorithm can simplify the conventional gradient-descent-based analysis, leading to instance-dependent performance guarantees.
