Multiclass Online Learnability under Bandit Feedback

Ananth Raman; Vinod Raman; Unique Subedi; Idan Mehalel; Ambuj Tewari

Multiclass Online Learnability under Bandit Feedback

Ananth Raman, Vinod Raman, Unique Subedi, Idan Mehalel, Ambuj Tewari

TL;DR

This work characterizes online multiclass learnability under bandit feedback with unbounded label spaces by establishing that the Bandit Littlestone dimension $BL(\mathcal{H})<\infty$ is necessary and sufficient for sublinear regret in both realizable and agnostic settings. It proves a tight, label-space-independent upper bound on regret when $BL(\mathcal{H})<\infty$, via a reduction to a finite-label projection and an EXP4-based strategy, yielding $R_T = O\big(\sqrt{L(\mathcal{H})\,BL(\mathcal{H})\,T\,\log T}\big)$. The paper also shows that Sequential Uniform Convergence is necessary but not sufficient for bandit online learnability, highlighting a separation between SUC and learnability in the bandit band. Finally, it provides lower bounds demonstrating the necessity of finite BLdim and discusses open questions on tightening bounds and exploring randomized dimensions for bandit feedback.

Abstract

We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not sufficient for bandit online learnability. Our result complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023] who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting even when the label space is unbounded.

Multiclass Online Learnability under Bandit Feedback

TL;DR

This work characterizes online multiclass learnability under bandit feedback with unbounded label spaces by establishing that the Bandit Littlestone dimension

is necessary and sufficient for sublinear regret in both realizable and agnostic settings. It proves a tight, label-space-independent upper bound on regret when

, via a reduction to a finite-label projection and an EXP4-based strategy, yielding

. The paper also shows that Sequential Uniform Convergence is necessary but not sufficient for bandit online learnability, highlighting a separation between SUC and learnability in the bandit band. Finally, it provides lower bounds demonstrating the necessity of finite BLdim and discusses open questions on tightening bounds and exploring randomized dimensions for bandit feedback.

Abstract

Paper Structure (8 sections, 15 theorems, 12 equations, 1 figure, 2 algorithms)

This paper contains 8 sections, 15 theorems, 12 equations, 1 figure, 2 algorithms.

Introduction
Preliminaries
Online Learning
Online Learnability and Uniform Convergence
BLdim is Sufficient for Bandit Online Learnability
Finite BLdim is Necessary for Bandit Online Learnability
Discussion and Open Questions
Proof of Lemma \ref{['lem:finitekalgproj']}

Key Result

Theorem 1

Let $\mathcal{H} \subseteq \mathcal{Y}^\mathcal{X}$ and $C_{\mathcal{H}} := \sup_{x\in \mathcal{X}} |\{h(x) : h \in \mathcal{H}\}|$. The following statements are equivalent:

Figures (1)

Figure 1: Landscape of multiclass online learnability. The Sequential Graph (SG) dimension (see Definition \ref{['def:sgdim']}) characterizes SUC.

Theorems & Definitions (19)

Theorem 1
Theorem 2
Theorem 3
Definition 4: Bandit Online Learnability
Definition 5: Littlestone dimension Littlestone1987LearningQWDanielyERMprinciple
Definition 6: Bandit Littlestone dimension DanielyERMprinciple
Theorem 7: Realizable Learnability DanielyERMprinciple
Theorem 8: Agnostic Learnability daniely2013price
Definition 9: Sequential Graph dimension hanneke2023multiclass
Theorem 10: hanneke2023multiclassrakhlin2015online
...and 9 more

Multiclass Online Learnability under Bandit Feedback

TL;DR

Abstract

Multiclass Online Learnability under Bandit Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (19)