Online Learning with Bounded Recall
Jon Schneider, Kiran Vodrahalli
TL;DR
This work analyzes full-information online learning under bounded recall, where decisions rely only on the last $M$ rewards. It proves fundamental limits for bounded-recall learners, showing that naive mean-based windowing yields constant or suboptimal regret, and then constructs stationary bounded-recall algorithms with near-optimal $O\left(\sqrt{(\log d)/M}\right)$ per-round regret using AverageRestart and AverageRestartFullHorizon. A key insight is that asymmetry in how past rounds are weighed is essential; symmetric bounded-recall algorithms cannot achieve sublinear regret. Empirical results corroborate the theoretical findings, demonstrating improved performance of the proposed bounded-recall methods in drifting and non-stationary environments, with implications for privacy-preserving and streaming learning scenarios.
Abstract
We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-$\textit{bounded-recall}$ if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $Θ(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses -- any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round.
