Bandits with Abstention under Expert Advice

Stephen Pasteris; Alberto Rumi; Maximilian Thiessen; Shota Saito; Atsushi Miyauchi; Fabio Vitale; Mark Herbster

Bandits with Abstention under Expert Advice

Stephen Pasteris, Alberto Rumi, Maximilian Thiessen, Shota Saito, Atsushi Miyauchi, Fabio Vitale, Mark Herbster

TL;DR

The CBA algorithm is proposed, which exploits the assumption that one action corresponding to the learner's abstention from play, has no reward or loss on every trial, and is the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors.

Abstract

We study the classic problem of prediction with expert advice under bandit feedback. Our model assumes that one action, corresponding to the learner's abstention from play, has no reward or loss on every trial. We propose the CBA algorithm, which exploits this assumption to obtain reward bounds that can significantly improve those of the classical Exp4 algorithm. We can view our problem as the aggregation of confidence-rated predictors when the learner has the option of abstention from play. Importantly, we are the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors. In the special case of specialists we achieve a novel reward bound, significantly improving previous bounds of SpecialistExp (treating abstention as another action). As an example application, we discuss learning unions of balls in a finite metric space. In this contextual setting, we devise an efficient implementation of CBA, reducing the runtime from quadratic to almost linear in the number of contexts. Preliminary experiments show that CBA improves over existing bandit algorithms.

Bandits with Abstention under Expert Advice

TL;DR

Abstract

Paper Structure (19 sections, 7 theorems, 63 equations, 6 figures, 3 algorithms)

This paper contains 19 sections, 7 theorems, 63 equations, 6 figures, 3 algorithms.

Introduction
Additional related work
Problem formulation and notation
Main result
The CBA algorithm
Adversarial contextual bandits with abstention
A lower bound
Efficient learning with balls
Experiments
CBA analysis
Unbounded precision case
Efficient implementation proof
Lower bound proof
Overlapping balls extension
The details of the graph bases
...and 4 more sections

Key Result

Theorem 3.1

CBA takes parameters $\eta\in(0,1)$ and $\boldsymbol{w}_1\in\mathbb{R}_+^E$ . For any $\boldsymbol{u}\in\mathcal{V}$ the expected cumulative reward of CBA is bounded below by: where the expectations are with respect to the randomization of CBA's strategy. The per-trial time complexity of CBA is in $\mathcal{O}(KE)$.

Figures (6)

Figure 1: Illustrative example of abstention where we cover the foreground and background classes with metric balls. We consider two clusters (blue and orange) as the foreground and one background class (white), using the shortest path $d_\infty$ metric. Using abstention, we can cover two clusters with one ball for each and abstain the background with no balls required (Fig. \ref{['fig:example1']}). In contrast, if we treat the background class as another class, it would require significantly more balls to cover the background class, as seen by the 10 gray balls in Fig. \ref{['fig:example2']}. If the number of balls to cover significantly increases like in this case, the bound involving the number of balls also gets significantly worse.
Figure 2: Results regarding the number of mistakes over time, the four main settings are presented from left to right: the Stochastic Block Model, Gaussian graph, Cora graph and LastFM Asia graph. In this context, D1, D2, and D-INF represent the $p$-norm bases, LVC represents the community detection basis, and INT represents the interval basis. The baselines, EXP3 for each context, Contextual Bandit with similarity, and GABA-II, are denoted as EXP3, CBSim, and GABA, respectively, and are represented with dashed lines. All the figures display the data with 95% confidence intervals over 20 runs, calculated using the standard error multiplied by the $z$-score 1.96.
Figure 3: Stochastic Block Model results, dotted lines represent different baselines, while solid lines are used to represent various results.
Figure 4: Gaussian graph results, dotted lines represent different baselines, while solid lines are used to represent various results.
Figure 5: Cora results, dotted lines represent different baselines, while solid lines are used to represent various results
...and 1 more figures

Theorems & Definitions (12)

Theorem 3.1
Corollary 5.1
proof
Proposition 5.2
Theorem 5.3
Lemma A.1
proof
proof : Proof of Theorem \ref{['cbath']}
Proposition C.1
proof
...and 2 more

Bandits with Abstention under Expert Advice

TL;DR

Abstract

Bandits with Abstention under Expert Advice

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (12)