Table of Contents
Fetching ...

Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a Fixed Budget

Masahiro Kato

TL;DR

This paper studies fixed-budget best-arm identification (BAI) under Gaussian rewards with arm-variant variances, focusing on a worst-case perspective when means and the best arm are unknown. It derives a tight worst-case lower bound that depends solely on the reward variances and proposes the Generalized-Neyman-Allocation with Empirical Best Arm (GNA-EBA) strategy, a non-adaptive scheme that matches the lower bound in the asymptotic regime when variances are known. The authors also show asymptotic optimality in the small-gap limit and discuss the implications relative to prior work, including comparisons to Komiyama2022 and degenne2023existence. Extensions include Hypothesis BAI (HBAI) and simulations demonstrating variance-aware allocations outperforming uniform baselines in multi-armed Gaussian bandits. These results illuminate how distributional information governs fixed-budget BAI design and provide practically implementable, variance-aware allocation rules with provable optimality guarantees.

Abstract

This study investigates the experimental design problem for identifying the arm with the highest expected outcome, referred to as best arm identification (BAI). In our experiments, the number of treatment-allocation rounds is fixed. During each round, a decision-maker allocates an arm and observes a corresponding outcome, which follows a Gaussian distribution with variances that can differ among the arms. At the end of the experiment, the decision-maker recommends one of the arms as an estimate of the best arm. To design an experiment, we first discuss lower bounds for the probability of misidentification. Our analysis highlights that the available information on the outcome distribution, such as means (expected outcomes), variances, and the choice of the best arm, significantly influences the lower bounds. Because available information is limited in actual experiments, we develop a lower bound that is valid under the unknown means and the unknown choice of the best arm, which are referred to as the worst-case lower bound. We demonstrate that the worst-case lower bound depends solely on the variances of the outcomes. Then, under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, an extension of the Neyman allocation proposed by Neyman (1934). We show that the GNA-EBA strategy is asymptotically optimal in the sense that its probability of misidentification aligns with the lower bounds as the sample size increases infinitely and the differences between the expected outcomes of the best and other suboptimal arms converge to the same values across arms. We refer to such strategies as asymptotically worst-case optimal.

Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a Fixed Budget

TL;DR

This paper studies fixed-budget best-arm identification (BAI) under Gaussian rewards with arm-variant variances, focusing on a worst-case perspective when means and the best arm are unknown. It derives a tight worst-case lower bound that depends solely on the reward variances and proposes the Generalized-Neyman-Allocation with Empirical Best Arm (GNA-EBA) strategy, a non-adaptive scheme that matches the lower bound in the asymptotic regime when variances are known. The authors also show asymptotic optimality in the small-gap limit and discuss the implications relative to prior work, including comparisons to Komiyama2022 and degenne2023existence. Extensions include Hypothesis BAI (HBAI) and simulations demonstrating variance-aware allocations outperforming uniform baselines in multi-armed Gaussian bandits. These results illuminate how distributional information governs fixed-budget BAI design and provide practically implementable, variance-aware allocation rules with provable optimality guarantees.

Abstract

This study investigates the experimental design problem for identifying the arm with the highest expected outcome, referred to as best arm identification (BAI). In our experiments, the number of treatment-allocation rounds is fixed. During each round, a decision-maker allocates an arm and observes a corresponding outcome, which follows a Gaussian distribution with variances that can differ among the arms. At the end of the experiment, the decision-maker recommends one of the arms as an estimate of the best arm. To design an experiment, we first discuss lower bounds for the probability of misidentification. Our analysis highlights that the available information on the outcome distribution, such as means (expected outcomes), variances, and the choice of the best arm, significantly influences the lower bounds. Because available information is limited in actual experiments, we develop a lower bound that is valid under the unknown means and the unknown choice of the best arm, which are referred to as the worst-case lower bound. We demonstrate that the worst-case lower bound depends solely on the variances of the outcomes. Then, under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, an extension of the Neyman allocation proposed by Neyman (1934). We show that the GNA-EBA strategy is asymptotically optimal in the sense that its probability of misidentification aligns with the lower bounds as the sample size increases infinitely and the differences between the expected outcomes of the best and other suboptimal arms converge to the same values across arms. We refer to such strategies as asymptotically worst-case optimal.
Paper Structure (50 sections, 9 theorems, 77 equations, 26 figures, 1 table, 1 algorithm)

This paper contains 50 sections, 9 theorems, 77 equations, 26 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

For any $0 < \underline{\Delta} \leq \overline{\Delta} < \infty$, any consistent (Definition def:consistent) strategy $\pi\in\Pi^{\mathrm{cons}}\cap \Pi^{\mathrm{inv}}$ satisfies where $\Omega^{a^*, a}(w) = \frac{(\sigma^{a^*})^2}{w(a^*)} + \frac{(\sigma^a)^2}{w(a)}$.

Figures (26)

  • Figure 1: An idea in the derivation of the lower bounds. To lower bound the probability of misidentification (upper bound $- \frac{1}{T}\log\mathbb{P}_{ P }(\widehat{a}^\pi_T \neq a^*(P))$) it is sufficient to consider a case in the right figure.
  • Figure 2: The region where there exists matching lower and upper bounds.
  • Figure : $K = 2$
  • Figure : $K = 5$
  • Figure : $K=20$.
  • ...and 21 more figures

Theorems & Definitions (23)

  • Definition 3.1: Consistent strategy
  • Definition 3.2: Asymptotically invariant strategy
  • Theorem 4.1: Best-arm-worst-case lower bound
  • Theorem 4.2: Upper Bound of the GNA-EBA strategy
  • Theorem 4.3: Worst-case upper bound of the GNA-EBA strategy
  • proof
  • Theorem 4.4
  • proof
  • Corollary 4.5
  • Lemma 5.1: Lower bound given known distributions
  • ...and 13 more