Table of Contents
Fetching ...

Identifying All ε-Best Arms in (Misspecified) Linear Bandits

Zhekai Li, Tianyi Ma, Cheng Hua, Ruihao Zhu

TL;DR

This work tackles identifying all ε-best arms in linear bandits under pure exploration. It introduces LinFACT, a δ-PAC, phase-based algorithm that leverages XY-optimal design to achieve instance-optimal sample complexity up to logarithmic factors, and it establishes a novel information-theoretic lower bound to characterize problem complexity. The paper also extends the framework to misspecified linear models via orthogonal parameterization and to generalized linear models, providing corresponding upper bounds. Empirical results on synthetic and real drug-discovery data demonstrate substantial improvements in identification accuracy, sample efficiency, and computational speed, indicating strong practical impact for early-stage exploratory tasks. The theoretical and empirical results collectively offer a principled approach to robust, scalable multi-candidate identification in structured bandit settings with broad applicability.

Abstract

Motivated by the need to efficiently identify multiple candidates in high trial-and-error cost tasks such as drug discovery, we propose a near-optimal algorithm to identify all ε-best arms (i.e., those at most ε worse than the optimum). Specifically, we introduce LinFACT, an algorithm designed to optimize the identification of all ε-best arms in linear bandits. We establish a novel information-theoretic lower bound on the sample complexity of this problem and demonstrate that LinFACT achieves instance optimality by matching this lower bound up to a logarithmic factor. A key ingredient of our proof is to integrate the lower bound directly into the scaling process for upper bound derivation, determining the termination round and thus the sample complexity. We also extend our analysis to settings with model misspecification and generalized linear models. Numerical experiments, including synthetic and real drug discovery data, demonstrate that LinFACT identifies more promising candidates with reduced sample complexity, offering significant computational efficiency and accelerating early-stage exploratory experiments.

Identifying All ε-Best Arms in (Misspecified) Linear Bandits

TL;DR

This work tackles identifying all ε-best arms in linear bandits under pure exploration. It introduces LinFACT, a δ-PAC, phase-based algorithm that leverages XY-optimal design to achieve instance-optimal sample complexity up to logarithmic factors, and it establishes a novel information-theoretic lower bound to characterize problem complexity. The paper also extends the framework to misspecified linear models via orthogonal parameterization and to generalized linear models, providing corresponding upper bounds. Empirical results on synthetic and real drug-discovery data demonstrate substantial improvements in identification accuracy, sample efficiency, and computational speed, indicating strong practical impact for early-stage exploratory tasks. The theoretical and empirical results collectively offer a principled approach to robust, scalable multi-candidate identification in structured bandit settings with broad applicability.

Abstract

Motivated by the need to efficiently identify multiple candidates in high trial-and-error cost tasks such as drug discovery, we propose a near-optimal algorithm to identify all ε-best arms (i.e., those at most ε worse than the optimum). Specifically, we introduce LinFACT, an algorithm designed to optimize the identification of all ε-best arms in linear bandits. We establish a novel information-theoretic lower bound on the sample complexity of this problem and demonstrate that LinFACT achieves instance optimality by matching this lower bound up to a logarithmic factor. A key ingredient of our proof is to integrate the lower bound directly into the scaling process for upper bound derivation, determining the termination round and thus the sample complexity. We also extend our analysis to settings with model misspecification and generalized linear models. Numerical experiments, including synthetic and real drug discovery data, demonstrate that LinFACT identifies more promising candidates with reduced sample complexity, offering significant computational efficiency and accelerating early-stage exploratory experiments.

Paper Structure

This paper contains 55 sections, 46 theorems, 191 equations, 12 figures, 3 tables.

Key Result

Proposition 1

For any fixed sampling policy and any given vector $\boldsymbol{x} \in \mathbb{R}^d$, with probability at least $1-\delta$, the following holds. where the anytime confidence bound $B_{t, \delta}$ is given by $B_{t, \delta} = 2 \sqrt{2 \left( d \log(6) + \log\left( \frac{1}{\delta} \right) \right)}.$

Figures (12)

  • Figure 1: Illustration of the Stopping Condition: Best Arm Identification vs. All $\varepsilon$-Best Arms Identification
  • Figure 2: Difference Between Standard OLS and Misspecification-Adjusted Projection Estimates
  • Figure 3: Orthogonal Parameterization and Projection.
  • Figure 4: Illustration of the Synthetic Experiment Settings
  • Figure 5: $F1$ Scores for Different Synthetic Experiments Among Algorithms
  • ...and 7 more figures

Theorems & Definitions (50)

  • Definition 1
  • Definition 2
  • Proposition 1: lattimore2020bandit
  • Proposition 2: qin2025dual
  • Theorem 1: Lower Bound
  • Remark 1: Generality of the Lower Bound
  • Theorem 2: Upper Bounds, G-Optimal Design
  • Theorem 3: Upper Bound, $\mathcal{XY}$-Optimal Design
  • Theorem 4: Upper Bound, Misspecification
  • Theorem 5: Upper Bound, Orthogonal Parameterization
  • ...and 40 more