Table of Contents
Fetching ...

Beyond Bandit Feedback in Online Multiclass Classification

Dirk van der Hoeven, Federico Fusco, Nicolò Cesa-Bianchi

TL;DR

The paper tackles online multiclass classification under arbitrary feedback graphs, extending beyond standard bandit and full-information feedback. It introduces Gappletron, an algorithm that leverages a minimum dominating set and a gap-map surrogate framework, enabling surrogate-regret analysis for a broad class of regular losses. The authors establish expectation and high-probability bounds of the form $O\big(B\sqrt{\rho K T}\big)$ for surrogate regret and show tight lower bounds $\Omega(B^2K + \sqrt{T})$, with full-information refinements yielding $O(B^2K)$ surrogate regret. Experiments on synthetic data demonstrate competitive performance across various graph structures, validating the theoretical results and highlighting the practical utility in label-efficient and filtering scenarios. Overall, the work extends online learning with feedback graphs to multiclass classification, providing both strong theoretical guarantees and empirical validation.

Abstract

We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. While including bandit feedback as a special case, feedback graphs allow a much richer set of applications, including filtering and label efficient classification. We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs. For this new algorithm, we prove surrogate regret bounds that hold, both in expectation and with high probability, for a large class of surrogate losses. Our bounds are of order $B\sqrt{ρKT}$, where $B$ is the diameter of the prediction space, $K$ is the number of classes, $T$ is the time horizon, and $ρ$ is the domination number (a graph-theoretic parameter affecting the amount of exploration). In the full information case, we show that Gappletron achieves a constant surrogate regret of order $B^2K$. We also prove a general lower bound of order $\max\big\{B^2K,\sqrt{T}\big\}$ showing that our upper bounds are not significantly improvable. Experiments on synthetic data show that for various feedback graphs, our algorithm is competitive against known baselines.

Beyond Bandit Feedback in Online Multiclass Classification

TL;DR

The paper tackles online multiclass classification under arbitrary feedback graphs, extending beyond standard bandit and full-information feedback. It introduces Gappletron, an algorithm that leverages a minimum dominating set and a gap-map surrogate framework, enabling surrogate-regret analysis for a broad class of regular losses. The authors establish expectation and high-probability bounds of the form for surrogate regret and show tight lower bounds , with full-information refinements yielding surrogate regret. Experiments on synthetic data demonstrate competitive performance across various graph structures, validating the theoretical results and highlighting the practical utility in label-efficient and filtering scenarios. Overall, the work extends online learning with feedback graphs to multiclass classification, providing both strong theoretical guarantees and empirical validation.

Abstract

We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. While including bandit feedback as a special case, feedback graphs allow a much richer set of applications, including filtering and label efficient classification. We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs. For this new algorithm, we prove surrogate regret bounds that hold, both in expectation and with high probability, for a large class of surrogate losses. Our bounds are of order , where is the diameter of the prediction space, is the number of classes, is the time horizon, and is the domination number (a graph-theoretic parameter affecting the amount of exploration). In the full information case, we show that Gappletron achieves a constant surrogate regret of order . We also prove a general lower bound of order showing that our upper bounds are not significantly improvable. Experiments on synthetic data show that for various feedback graphs, our algorithm is competitive against known baselines.

Paper Structure

This paper contains 17 sections, 14 theorems, 69 equations, 9 figures, 1 algorithm.

Key Result

Lemma 1

Fix any feedback graph $\mathcal{G}$ and suppose that, for all $t$, $\ell_t$ is a regular surrogate loss with respect to $\ell$. Then Gappletron, run on $\mathcal{G}$ with $a$ such that $a(\W_t, \x_t) = \ell(\W_t, \x_t, y_t^\star)$, satisfies

Figures (9)

  • Figure 1: Overview of the surrogate regret bounds in the separable and non-separable case. The upper bounds hold with high probability, while the lower bounds apply to any randomized prediction algorithm. All bounds are novel except for the lower bound in the full information separable case beygelzimer2019bandit.
  • Figure 2: Error rate in non-separable synthetic bandit experiments showcasing Gappletron against known baselines. The points are the means and the whiskers are minimum and maximum error rate over ten repetitions (details in Section \ref{['sec:experiments']}).
  • Figure 3: Results of the synthetic experiments for the bandit setting. The plot shows the best results of algorithms with parameters suggested by theory, or tuned with all parameters set to 1, except for $T$. The rows indicate different values for $K$ and the columns different values for $d$. Whiskers show the minimum and the maximum error rate over ten repetitions.
  • Figure 4: Results of the synthetic experiments for the bandit setting. The parameters of algorithms are set to 1, except for $T$. The rows are the different values for $K$ and the columns are the different values for $d$. The whiskers represent the minimum and maximum error rates of the ten repetitions.
  • Figure 5: Results of the synthetic experiments for the bandit setting with theoretical tuning. The rows are the different values for $K$ and the columns are the different values for $d$. The whiskers represent the minimum and maximum error rates of the ten repetitions.
  • ...and 4 more figures

Theorems & Definitions (22)

  • Lemma 1
  • proof
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Theorem 2
  • Corollary 1
  • Lemma 2
  • proof : Proof of Lemma \ref{['lem: hinge upper bound zo']}
  • Lemma 2
  • ...and 12 more