Bounds on the price of feedback for mistake-bounded online learning
Jesse Geneson, Linus Tang
TL;DR
Bounds on the price of feedback for mistake-bounded online learning advances the theory of mistake-bound online reinforcement learning by tightening worst-case bounds across standard, bandit, and delayed feedback variants. It develops a unified weighted-majority upper-bound framework, fixes an error in prior analyses, and derives near-optimal bounds for opt\_bs and for compositions of function classes, as well as generalized agnostic and delayed-feedback formulations. Notable contributions include showing $\textnormal{opt}_{\textnormal{bs}}(F) = (1+o(1))\,k\ln k\,\textnormal{opt}_{\textnormal{std}}(F)$ for codomain size $k$, establishing $\textnormal{opt}_{\textnormal{bs}}(k,2)=\Theta(k\ln k)$ with an improved upper bound, and proving $\textnormal{opt}_{\textnormal{ag}}(F,\eta) \le k\ln k\,(1+o(1))(\textnormal{opt}_{\textnormal{std}}(F)+\eta)$ with matching lower bounds up to constants. The paper also accurately quantifies the price of bandit feedback in multiclass and $r$-input delayed settings, and strengthens closure (composition) bounds, bringing them closer to optimal in several regimes. Together, these results illuminate fundamental limits of feedback in online learning and raise precise open questions about limits and constants in various high-dimensional settings.
Abstract
We improve several worst-case bounds for various online learning scenarios from (Auer and Long, Machine Learning, 1999). In particular, we sharpen an upper bound for delayed ambiguous reinforcement learning by a factor of 2 and an upper bound for learning compositions of families of functions by a factor of 2.41. We also improve a lower bound from the same paper for learning compositions of $k$ families of functions by a factor of $Θ(\ln{k})$, matching the upper bound up to a constant factor. In addition, we solve a problem from (Long, Theoretical Computer Science, 2020) on the price of bandit feedback with respect to standard feedback for multiclass learning, and we improve an upper bound from (Feng et al., Theoretical Computer Science, 2023) on the price of $r$-input delayed ambiguous reinforcement learning by a factor of $r$, matching a lower bound from the same paper up to the leading term.
