Online Learning in the Random Order Model
Martino Bernasconi, Andrea Celli, Riccardo Colini-Baldeschi, Federico Fusco, Stefano Leonardi, Matteo Russo
TL;DR
This work studies online learning under a random-order input model, which sits between i.i.d. and adversarial regimes and can exhibit finite-time non-stationarity. It introduces a general Simulation template to convert stochastic online-learning algorithms into robust RO algorithms with regret essentially matching stochastic rates, and it demonstrates this across prediction with delays, long-term constraints, bandits with switching costs, and online classification. A key insight is a negative separation showing stochastic methods can fail in RO (via the Birthday Paradox), balanced by positive results that RO minimax regret often coincides with stochastic regret, enabling practical data shuffling to recover stochastic performance. In online classification, RO learnability is characterized by VC dimension rather than Littlestone dimension, highlighting a fundamental separation from the adversarial model. The work thus provides both theoretical and practical tools for leveraging stochastic methods in RO settings and outlines directions for extending the Simulation paradigm to broader feedback models.
Abstract
In the random-order model for online learning, the sequence of losses is chosen upfront by an adversary and presented to the learner after a random permutation. Any random-order input is \emph{asymptotically} equivalent to a stochastic i.i.d. one, but, for finite times, it may exhibit significant {\em non-stationarity}, which can hinder the performance of stochastic learning algorithms. While algorithms for adversarial inputs naturally maintain their regret guarantees in random order, simple no-regret algorithms exist for the stochastic model that fail against random-order instances. In this paper, we propose a general template to adapt stochastic learning algorithms to the random-order model without substantially affecting their regret guarantees. This allows us to recover improved regret bounds for prediction with delays, online learning with constraints, and bandits with switching costs. Finally, we investigate online classification and prove that, in random order, learnability is characterized by the VC dimension rather than the Littlestone dimension, thus providing a further separation from the general adversarial model.
