Table of Contents
Fetching ...

Representative Arm Identification: A fixed confidence approach to identify cluster representatives

Sarvesh Gharat, Aniket Yadav, Nikhil Karamchandani, Jayakrishnan Nair

TL;DR

This work studies Representative Arm Identification (RAI) in stochastic multi-armed bandits, where arms are partitioned into clusters with a target number of representatives per cluster under fixed confidence. It introduces an instance-dependent lower bound based on the bottleneck gap and develops two confidence-interval based algorithms, Vanilla Round Robin and Butterscotch Round Robin, with delta-PC guarantees and order-matching upper bounds. The methods are evaluated empirically against a LUCB-type baseline on synthetic and real-world datasets (MovieLens), showing strong performance, with Butterscotch often the best. By unifying several classic MAB problems (best-arm, top-$K$, full and coarse ranking) under the RAI framework, the paper provides principled sample complexity guarantees and practical algorithms for a broad range of applications such as crowdsourcing and content recommendation.

Abstract

We study the representative arm identification (RAI) problem in the multi-armed bandits (MAB) framework, wherein we have a collection of arms, each associated with an unknown reward distribution. An underlying instance is defined by a partitioning of the arms into clusters of predefined sizes, such that for any $j > i$, all arms in cluster $i$ have a larger mean reward than those in cluster $j$. The goal in RAI is to reliably identify a certain prespecified number of arms from each cluster, while using as few arm pulls as possible. The RAI problem covers as special cases several well-studied MAB problems such as identifying the best arm or any $M$ out of the top $K$, as well as both full and coarse ranking. We start by providing an instance-dependent lower bound on the sample complexity of any feasible algorithm for this setting. We then propose two algorithms, based on the idea of confidence intervals, and provide high probability upper bounds on their sample complexity, which orderwise match the lower bound. Finally, we do an empirical comparison of both algorithms along with an LUCB-type alternative on both synthetic and real-world datasets, and demonstrate the superior performance of our proposed schemes in most cases.

Representative Arm Identification: A fixed confidence approach to identify cluster representatives

TL;DR

This work studies Representative Arm Identification (RAI) in stochastic multi-armed bandits, where arms are partitioned into clusters with a target number of representatives per cluster under fixed confidence. It introduces an instance-dependent lower bound based on the bottleneck gap and develops two confidence-interval based algorithms, Vanilla Round Robin and Butterscotch Round Robin, with delta-PC guarantees and order-matching upper bounds. The methods are evaluated empirically against a LUCB-type baseline on synthetic and real-world datasets (MovieLens), showing strong performance, with Butterscotch often the best. By unifying several classic MAB problems (best-arm, top-, full and coarse ranking) under the RAI framework, the paper provides principled sample complexity guarantees and practical algorithms for a broad range of applications such as crowdsourcing and content recommendation.

Abstract

We study the representative arm identification (RAI) problem in the multi-armed bandits (MAB) framework, wherein we have a collection of arms, each associated with an unknown reward distribution. An underlying instance is defined by a partitioning of the arms into clusters of predefined sizes, such that for any , all arms in cluster have a larger mean reward than those in cluster . The goal in RAI is to reliably identify a certain prespecified number of arms from each cluster, while using as few arm pulls as possible. The RAI problem covers as special cases several well-studied MAB problems such as identifying the best arm or any out of the top , as well as both full and coarse ranking. We start by providing an instance-dependent lower bound on the sample complexity of any feasible algorithm for this setting. We then propose two algorithms, based on the idea of confidence intervals, and provide high probability upper bounds on their sample complexity, which orderwise match the lower bound. Finally, we do an empirical comparison of both algorithms along with an LUCB-type alternative on both synthetic and real-world datasets, and demonstrate the superior performance of our proposed schemes in most cases.
Paper Structure (11 sections, 7 theorems, 40 equations, 3 figures, 3 tables, 3 algorithms)

This paper contains 11 sections, 7 theorems, 40 equations, 3 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

For a given error threshold $\delta \in (0,1),$ in the space of $\frac{1}{2}$-Gaussian instances (i.e., each arm has a Gaussian reward distribution with standard deviation $\sigma = \frac{1}{2}$), any $\delta$-PC algorithm $\mathcal{A}$ for the RAI problem satisfies

Figures (3)

  • Figure 1: An RAI problem instance with $m=6$ clusters, $c = (4,5,5,3,2,2),$ and $r = (2,3,1,2,0,1).$ The circled arms illustrate one of the correct outputs for this problem.
  • Figure 2: Comparision of sample complexity between Algorithm \ref{['algo: vanilla']}, Algorithm \ref{['algo: butterscotch']}, and an LUCB-style scheme for special cases of the RAI problem, over an instance created from the MovieLens dataset
  • Figure 3: An illustration of alternate instance. In the given instance, the arms marked in green represent a set of correct answers. Now, to form an alternate instance, we shift an arm from cluster 1 to cluster 2, placing it just behind the boundary of cluster 2 in the given instance.

Theorems & Definitions (15)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • Claim 1
  • Lemma 2
  • Claim 2
  • Claim 3
  • ...and 5 more