Online Learning with Bounded Recall

Jon Schneider; Kiran Vodrahalli

Online Learning with Bounded Recall

Jon Schneider, Kiran Vodrahalli

TL;DR

This work analyzes full-information online learning under bounded recall, where decisions rely only on the last $M$ rewards. It proves fundamental limits for bounded-recall learners, showing that naive mean-based windowing yields constant or suboptimal regret, and then constructs stationary bounded-recall algorithms with near-optimal $O\left(\sqrt{(\log d)/M}\right)$ per-round regret using AverageRestart and AverageRestartFullHorizon. A key insight is that asymmetry in how past rounds are weighed is essential; symmetric bounded-recall algorithms cannot achieve sublinear regret. Empirical results corroborate the theoretical findings, demonstrating improved performance of the proposed bounded-recall methods in drifting and non-stationary environments, with implications for privacy-preserving and streaming learning scenarios.

Abstract

We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-$\textit{bounded-recall}$ if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $Θ(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses -- any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round.

Online Learning with Bounded Recall

TL;DR

This work analyzes full-information online learning under bounded recall, where decisions rely only on the last

rewards. It proves fundamental limits for bounded-recall learners, showing that naive mean-based windowing yields constant or suboptimal regret, and then constructs stationary bounded-recall algorithms with near-optimal

per-round regret using AverageRestart and AverageRestartFullHorizon. A key insight is that asymmetry in how past rounds are weighed is essential; symmetric bounded-recall algorithms cannot achieve sublinear regret. Empirical results corroborate the theoretical findings, demonstrating improved performance of the proposed bounded-recall methods in drifting and non-stationary environments, with implications for privacy-preserving and streaming learning scenarios.

Abstract

We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm

if its output at time

can be written as a function of the

previous rewards (and not e.g. any other internal state of

). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last

rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of

, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past

losses -- any bounded-recall algorithm which plays a symmetric function of the past

losses must incur constant regret per round.

Paper Structure (21 sections, 11 theorems, 21 equations, 2 figures, 4 algorithms)

This paper contains 21 sections, 11 theorems, 21 equations, 2 figures, 4 algorithms.

Introduction
Related work.
Model and Preliminaries
Mean-based learners
Benchmarks for bounded-recall learning
Bounded-recall mean-based algorithms have high regret
Stationary Bounded-Recall Algorithms
Averaging over restarts
Averaging restarts over the entire time horizon
The necessity of asymmetry
Simulations
Conclusion and Future Work
Related Work
Adaptive Multiplicative Weights
Private Learning and Discarding Data
...and 6 more sections

Key Result

Theorem 3.1

Fix an $M > 0$. Then for any $M$-bounded-recall learning algorithm $\mathcal{A}$ and $T > M$, there exists a distribution $\mathcal{D}$ over online learning instances $\mathbf{r}{}$ of length $T$ with $d$ actions such that

Figures (2)

Figure 1: A plot of $\Delta_t$ over time, as used in Lemma \ref{['lem:counterexample']}.
Figure 2: (Left) We plot the total regret of the algorithms over time over a uniform average of high-frequency drifting scenarios where the periods of the mean reward of arm $1$ are $T/20, T/10, T/5,$ and $T/2$ and arm $2$ flips an unbiased coin for reward $\{\pm 1\}$ -- the bounded-recall algorithms significantly outperform the classic no-regret algorithms. (Right) We plot the total regret of the algorithms over time for one block of the adversarial rewards case (see the construction in Lemma \ref{['lem:counterexample']}) -- observe that the mean-based bounded-recall learner attains regret on order $M/6$ (here, $M = T/3$), while our no-regret bounded-recall learners all outperform Multiplicative Weights.

Theorems & Definitions (28)

Definition 2.1: Per-round Regret
Definition 2.2: Bounded-Recall Online Learning Algorithms
Definition 2.3: Mean-based algorithm
Theorem 3.1: Lower Bound
proof
Theorem 3.2
proof
Corollary 3.3
proof
Theorem 4.1
...and 18 more

Online Learning with Bounded Recall

TL;DR

Abstract

Online Learning with Bounded Recall

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (28)