Bounds on the price of feedback for mistake-bounded online learning

Jesse Geneson; Linus Tang

Bounds on the price of feedback for mistake-bounded online learning

Jesse Geneson, Linus Tang

TL;DR

Bounds on the price of feedback for mistake-bounded online learning advances the theory of mistake-bound online reinforcement learning by tightening worst-case bounds across standard, bandit, and delayed feedback variants. It develops a unified weighted-majority upper-bound framework, fixes an error in prior analyses, and derives near-optimal bounds for opt\_bs and for compositions of function classes, as well as generalized agnostic and delayed-feedback formulations. Notable contributions include showing $\textnormal{opt}_{\textnormal{bs}}(F) = (1+o(1))\,k\ln k\,\textnormal{opt}_{\textnormal{std}}(F)$ for codomain size $k$, establishing $\textnormal{opt}_{\textnormal{bs}}(k,2)=\Theta(k\ln k)$ with an improved upper bound, and proving $\textnormal{opt}_{\textnormal{ag}}(F,\eta) \le k\ln k\,(1+o(1))(\textnormal{opt}_{\textnormal{std}}(F)+\eta)$ with matching lower bounds up to constants. The paper also accurately quantifies the price of bandit feedback in multiclass and $r$-input delayed settings, and strengthens closure (composition) bounds, bringing them closer to optimal in several regimes. Together, these results illuminate fundamental limits of feedback in online learning and raise precise open questions about limits and constants in various high-dimensional settings.

Abstract

We improve several worst-case bounds for various online learning scenarios from (Auer and Long, Machine Learning, 1999). In particular, we sharpen an upper bound for delayed ambiguous reinforcement learning by a factor of 2 and an upper bound for learning compositions of families of functions by a factor of 2.41. We also improve a lower bound from the same paper for learning compositions of $k$ families of functions by a factor of $Θ(\ln{k})$, matching the upper bound up to a constant factor. In addition, we solve a problem from (Long, Theoretical Computer Science, 2020) on the price of bandit feedback with respect to standard feedback for multiclass learning, and we improve an upper bound from (Feng et al., Theoretical Computer Science, 2023) on the price of $r$-input delayed ambiguous reinforcement learning by a factor of $r$, matching a lower bound from the same paper up to the leading term.

Bounds on the price of feedback for mistake-bounded online learning

TL;DR

for codomain size

, establishing

with an improved upper bound, and proving

with matching lower bounds up to constants. The paper also accurately quantifies the price of bandit feedback in multiclass and

-input delayed settings, and strengthens closure (composition) bounds, bringing them closer to optimal in several regimes. Together, these results illuminate fundamental limits of feedback in online learning and raise precise open questions about limits and constants in various high-dimensional settings.

Abstract

families of functions by a factor of

, matching the upper bound up to a constant factor. In addition, we solve a problem from (Long, Theoretical Computer Science, 2020) on the price of bandit feedback with respect to standard feedback for multiclass learning, and we improve an upper bound from (Feng et al., Theoretical Computer Science, 2023) on the price of

-input delayed ambiguous reinforcement learning by a factor of

, matching a lower bound from the same paper up to the leading term.

Paper Structure (7 sections, 19 theorems, 81 equations)

This paper contains 7 sections, 19 theorems, 81 equations.

Introduction
Upper bound strategy
Bounds on $\textnormal{opt}_{\textnormal{bs}}(k,2)$
Agnostic learning
Delayed ambiguous reinforcement
Closure bounds
Discussion

Key Result

Theorem 3.1

The asymptotic formula $\textnormal{opt}_{\textnormal{bs}}(k,2)=\Theta(k\log k)$ holds.

Theorems & Definitions (36)

Theorem 3.1
proof
Lemma 3.2
proof
Lemma 3.3
proof
Lemma 3.4
proof
Theorem 3.5
proof
...and 26 more

Bounds on the price of feedback for mistake-bounded online learning

TL;DR

Abstract

Bounds on the price of feedback for mistake-bounded online learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (36)