A note on continuous-time online learning
Lexing Ying
TL;DR
This paper develops continuous-time formulations for online learning problems including online linear optimization, adversarial bandits, and adversarial linear bandits. It employs Legendre transforms and Ito's lemma to derive concise, optimal regret bounds across these problems, demonstrating that continuous-time analysis can reproduce and sometimes strengthen discrete-time results. Notable findings include a continuous-time regret bound of $R \le \beta^{-1}\ln d$ for online linear optimization (which vanishes as $\beta\to\infty$) and bounds of $R=\sqrt{2Td\ln d}$ for adversarial bandits and $R=\sqrt{2Td\ln k}$ for adversarial linear bandits. Overall, the work provides a unifying, concise framework for continuous-time online learning with potential extensions to a broad class of problems.
Abstract
In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit. For each problem, we extend the discrete-time algorithm to the continuous-time setting and provide a concise proof of the optimal regret bound.
