Table of Contents
Fetching ...

A note on continuous-time online learning

Lexing Ying

TL;DR

This paper develops continuous-time formulations for online learning problems including online linear optimization, adversarial bandits, and adversarial linear bandits. It employs Legendre transforms and Ito's lemma to derive concise, optimal regret bounds across these problems, demonstrating that continuous-time analysis can reproduce and sometimes strengthen discrete-time results. Notable findings include a continuous-time regret bound of $R \le \beta^{-1}\ln d$ for online linear optimization (which vanishes as $\beta\to\infty$) and bounds of $R=\sqrt{2Td\ln d}$ for adversarial bandits and $R=\sqrt{2Td\ln k}$ for adversarial linear bandits. Overall, the work provides a unifying, concise framework for continuous-time online learning with potential extensions to a broad class of problems.

Abstract

In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit. For each problem, we extend the discrete-time algorithm to the continuous-time setting and provide a concise proof of the optimal regret bound.

A note on continuous-time online learning

TL;DR

This paper develops continuous-time formulations for online learning problems including online linear optimization, adversarial bandits, and adversarial linear bandits. It employs Legendre transforms and Ito's lemma to derive concise, optimal regret bounds across these problems, demonstrating that continuous-time analysis can reproduce and sometimes strengthen discrete-time results. Notable findings include a continuous-time regret bound of for online linear optimization (which vanishes as ) and bounds of for adversarial bandits and for adversarial linear bandits. Overall, the work provides a unifying, concise framework for continuous-time online learning with potential extensions to a broad class of problems.

Abstract

In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit. For each problem, we extend the discrete-time algorithm to the continuous-time setting and provide a concise proof of the optimal regret bound.
Paper Structure (6 sections, 3 theorems, 45 equations)

This paper contains 6 sections, 3 theorems, 45 equations.

Key Result

Theorem 1

For any $\beta>0$, the continuous-time regret is bounded by $\beta^{-1} \ln d$.

Theorems & Definitions (8)

  • Remark 1
  • Theorem 1
  • proof
  • Remark 2
  • Theorem 2
  • proof
  • Theorem 3
  • proof