Table of Contents
Fetching ...

Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification

James A. Grant, David S. Leslie

TL;DR

The paper studies online binary classification under partial feedback (apple tasting) with a logistic contextual model, formulating it as a partial monitoring problem with side information. It establishes a near-optimal Bayesian regret bound for Thompson Sampling in this setting and develops practical implementations using Polya-Gamma augmentation (PG-TS) and a tunable Information-Directed Sampling (PG-IDS). The authors provide theoretical guarantees showing BR(T,TS) = Õ(√{dT}) under suitable assumptions and demonstrate empirically that PG-TS and tunable PG-IDS outperform UCB-style and baseline methods across multiple scenarios. The work bridges finite and compact parameter spaces, extends information-theoretic analysis to this PM variant, and offers scalable, Bayesian approaches with strong performance in selective feedback online classification tasks.

Abstract

We consider a variant of online binary classification where a learner sequentially assigns labels ($0$ or $1$) to items with unknown true class. If, but only if, the learner chooses label $1$ they immediately observe the true label of the item. The learner faces a trade-off between short-term classification accuracy and long-term information gain. This problem has previously been studied under the name of the `apple tasting' problem. We revisit this problem as a partial monitoring problem with side information, and focus on the case where item features are linked to true classes via a logistic regression model. Our principal contribution is a study of the performance of Thompson Sampling (TS) for this problem. Using recently developed information-theoretic tools, we show that TS achieves a Bayesian regret bound of an improved order to previous approaches. Further, we experimentally verify that efficient approximations to TS and Information Directed Sampling via Pólya-Gamma augmentation have superior empirical performance to existing methods.

Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification

TL;DR

The paper studies online binary classification under partial feedback (apple tasting) with a logistic contextual model, formulating it as a partial monitoring problem with side information. It establishes a near-optimal Bayesian regret bound for Thompson Sampling in this setting and develops practical implementations using Polya-Gamma augmentation (PG-TS) and a tunable Information-Directed Sampling (PG-IDS). The authors provide theoretical guarantees showing BR(T,TS) = Õ(√{dT}) under suitable assumptions and demonstrate empirically that PG-TS and tunable PG-IDS outperform UCB-style and baseline methods across multiple scenarios. The work bridges finite and compact parameter spaces, extends information-theoretic analysis to this PM variant, and offers scalable, Bayesian approaches with strong performance in selective feedback online classification tasks.

Abstract

We consider a variant of online binary classification where a learner sequentially assigns labels ( or ) to items with unknown true class. If, but only if, the learner chooses label they immediately observe the true label of the item. The learner faces a trade-off between short-term classification accuracy and long-term information gain. This problem has previously been studied under the name of the `apple tasting' problem. We revisit this problem as a partial monitoring problem with side information, and focus on the case where item features are linked to true classes via a logistic regression model. Our principal contribution is a study of the performance of Thompson Sampling (TS) for this problem. Using recently developed information-theoretic tools, we show that TS achieves a Bayesian regret bound of an improved order to previous approaches. Further, we experimentally verify that efficient approximations to TS and Information Directed Sampling via Pólya-Gamma augmentation have superior empirical performance to existing methods.

Paper Structure

This paper contains 24 sections, 6 theorems, 98 equations, 3 figures, 4 tables, 5 algorithms.

Key Result

Theorem 1

For the contextual logistic apple tasting problem instantiated by $\theta^*\sim\pi_0$, the Bayesian regret of the Thompson Sampling policy, $\varphi^{TS}$, in $T$ rounds satisfies,

Figures (3)

  • Figure 1: Regret of algorithms on Problems (i), (ii) and (iii), over 50 replications. The green lines denote the $\epsilon$-Greedy policy with $\epsilon=0.1$, the yellow lines denote the traditional PG-IDS policy, the red lines denote the tunable PG-IDS policy with $\lambda=0.2$, the magenta lines denote the CBP-SIDE policy, the gray lines denote the SupLogistic policy, and the blue lines denote the PG-TS policy. In each case 90% empirical confidence regions are plotted around the median trajectory. The boxplots in the right-hand panel show the distribution of the final regret at time $T$for the four most successful algorithms.
  • Figure 2: Plots showing the effects of problem and algorithm parameters on regret.
  • Figure 3: Boxplots of the distribution of regret at final round for tunable PG-IDS with varying choices of tuning parameter $\lambda$ (in red), plotted alongside equivalent distributions for traditional PG-IDS (yellow) and TS (blue).

Theorems & Definitions (8)

  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 2
  • Theorem 3
  • Remark 1
  • Remark 2