Table of Contents
Fetching ...

Nearly Optimal Best Arm Identification for Semiparametric Bandits

Seok-Jin Kim

Abstract

We study fixed-confidence Best Arm Identification (BAI) in semiparametric bandits, where rewards are linear in arm features plus an unknown additive baseline shift. Unlike linear-bandit BAI, this setting requires orthogonalized regression, and its instance-optimal sample complexity has remained open. For the transductive setting, we establish an attainable instance-dependent lower bound characterized by the corresponding linear-bandit complexity on shifted features. We then propose a computationally efficient phase-elimination algorithm based on a new $XY$-design for orthogonalized regression. Our analysis yields a nearly optimal high-probability sample-complexity upper bound, up to log factors and an additive $d^2$ term, and experiments on synthetic instances and the Jester dataset show clear gains over prior baselines.

Nearly Optimal Best Arm Identification for Semiparametric Bandits

Abstract

We study fixed-confidence Best Arm Identification (BAI) in semiparametric bandits, where rewards are linear in arm features plus an unknown additive baseline shift. Unlike linear-bandit BAI, this setting requires orthogonalized regression, and its instance-optimal sample complexity has remained open. For the transductive setting, we establish an attainable instance-dependent lower bound characterized by the corresponding linear-bandit complexity on shifted features. We then propose a computationally efficient phase-elimination algorithm based on a new -design for orthogonalized regression. Our analysis yields a nearly optimal high-probability sample-complexity upper bound, up to log factors and an additive term, and experiments on synthetic instances and the Jester dataset show clear gains over prior baselines.

Paper Structure

This paper contains 54 sections, 8 theorems, 86 equations, 7 tables, 2 algorithms.

Key Result

Proposition 1

Under Assumption assumption; boundedness, consider any algorithm for fixed-confidence transductive BAI in semiparametric bandits with source features $\mathcal{X}$ and target features $\mathcal{Z}$, required to be $\delta$-correct for every admissible shift sequence $\{\nu_t\}$. Then there exists an $\blacktriangleleft$$\blacktriangleleft$

Theorems & Definitions (15)

  • Definition 1
  • Proposition 1: Lower bound for transductive BAI
  • Proposition 2: Compatibility of lower bounds
  • Corollary 1: Lower bound: Non-transductive case
  • Proposition 3: Performance of XOR design
  • Theorem 1: High-probability sample complexity bound
  • Corollary 2: Sample complexity: non-transductive case
  • proof
  • proof
  • proof
  • ...and 5 more