Table of Contents
Fetching ...

The Role of Contextual Information in Best Arm Identification

Masahiro Kato, Kaito Ariu

TL;DR

This work introduces contextual best-arm identification (BAI) under fixed confidence, where the objective is to identify the arm with the largest marginalized mean μ_a = E_{X∼ζ}[μ_{a,X}] while controlling the error probability δ. The authors derive instance-specific lower bounds for contextual BAI in both continuous and finite-context settings and show that incorporating context can strictly reduce the sample complexity relative to non-contextual bounds. They then propose the Contextual Track-and-Stop (CTS) algorithm, which extends Garivier & Kaufmann’s Track-and-Stop by tracking optimal context-arm allocations and employing a GLRT-based stopping rule; CTS is proven δ-PAC and shown to achieve asymptotically optimal sample complexity, with empirical simulations corroborating faster identification when context is informative. The results connect to causal-inference experimental design by leveraging covariates to improve estimation efficiency and demonstrate practical gains in arm-identification tasks. Overall, the paper provides a principled, theory-backed framework and algorithm for leveraging contextual information to accelerate best-arm identification in stochastic bandits.

Abstract

We study the best-arm identification problem with fixed confidence when contextual (covariate) information is available in stochastic bandits. Although we can use contextual information in each round, we are interested in the marginalized mean reward over the contextual distribution. Our goal is to identify the best arm with a minimal number of samplings under a given value of the error rate. We show the instance-specific sample complexity lower bounds for the problem. Then, we propose a context-aware version of the "Track-and-Stop" strategy, wherein the proportion of the arm draws tracks the set of optimal allocations and prove that the expected number of arm draws matches the lower bound asymptotically. We demonstrate that contextual information can be used to improve the efficiency of the identification of the best marginalized mean reward compared with the results of Garivier & Kaufmann (2016). We experimentally confirm that context information contributes to faster best-arm identification.

The Role of Contextual Information in Best Arm Identification

TL;DR

This work introduces contextual best-arm identification (BAI) under fixed confidence, where the objective is to identify the arm with the largest marginalized mean μ_a = E_{X∼ζ}[μ_{a,X}] while controlling the error probability δ. The authors derive instance-specific lower bounds for contextual BAI in both continuous and finite-context settings and show that incorporating context can strictly reduce the sample complexity relative to non-contextual bounds. They then propose the Contextual Track-and-Stop (CTS) algorithm, which extends Garivier & Kaufmann’s Track-and-Stop by tracking optimal context-arm allocations and employing a GLRT-based stopping rule; CTS is proven δ-PAC and shown to achieve asymptotically optimal sample complexity, with empirical simulations corroborating faster identification when context is informative. The results connect to causal-inference experimental design by leveraging covariates to improve estimation efficiency and demonstrate practical gains in arm-identification tasks. Overall, the paper provides a principled, theory-backed framework and algorithm for leveraging contextual information to accelerate best-arm identification in stochastic bandits.

Abstract

We study the best-arm identification problem with fixed confidence when contextual (covariate) information is available in stochastic bandits. Although we can use contextual information in each round, we are interested in the marginalized mean reward over the contextual distribution. Our goal is to identify the best arm with a minimal number of samplings under a given value of the error rate. We show the instance-specific sample complexity lower bounds for the problem. Then, we propose a context-aware version of the "Track-and-Stop" strategy, wherein the proportion of the arm draws tracks the set of optimal allocations and prove that the expected number of arm draws matches the lower bound asymptotically. We demonstrate that contextual information can be used to improve the efficiency of the identification of the best marginalized mean reward compared with the results of Garivier & Kaufmann (2016). We experimentally confirm that context information contributes to faster best-arm identification.

Paper Structure

This paper contains 46 sections, 24 theorems, 195 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem 2.1

Let $\delta \in (0, 1/2)$. Assume that for all $x \in \mathbb{R}$, distributions $p_{1,x}, \ldots, p_{K,x}$ are absolutely continuous with respect to the Lebesgue measure. Let $\delta\in(0,1/2)$. Then, for any $\delta$-PAC strategy, for any $\mathcal{V} = (\boldsymbol{p}, \zeta) \in \Omega$, where

Figures (4)

  • Figure 1: Sample complexity gains through context. The $x$ axis denotes $\rho_{1\mathcal{X}}\in[0,1]$ and the $y$ axis denotes $\rho_{2\mathcal{X}}\in[0,1]$. The contour lines indicate the sample complexity gains: $(1 - \widetilde{\ell}/ \ell)100\%$.
  • Figure 2: Results of $\alpha$-elimination. The left figure displays the results with $\sigma^2_1=1$; the right figure displays the results with $\sigma^2_1=2$.
  • Figure 3: Graph illustrating the maximum GLRT statistic $\max_{a\in[K]}\min_{b\in[K]\backslash\{a\}}Z_{a,b}(t)$. The solid line represents the averaged value over $20$ trials; the light-colored area indicates the values between the first and third quartiles.
  • Figure 4: This graph illustrates the maximum GLRT statistic $\max_{a\in[K]}\min_{b\in[K]\backslash\{a\}}Z_{a,b}(t)$. The solid line represents the averaged value over $20$ trials, and the light-colored area shows the values between the first and third quartiles.

Theorems & Definitions (39)

  • Definition 1.1
  • Theorem 2.1
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 4.1
  • Lemma 4.2
  • Lemma 5.1
  • Lemma 5.2
  • Lemma 5.3
  • Lemma 5.4
  • ...and 29 more