Optimal Multi-Fidelity Best-Arm Identification
Riccardo Poiani, Rémy Degenne, Emilie Kaufmann, Alberto Maria Metelli, Marcello Restelli
TL;DR
This work addresses fixed-confidence best-arm identification in a multi-fidelity bandit setting where observations can be obtained at varying costs and fidelities. It introduces a tight instance-dependent lower bound on cost complexity based on a KL-information–theoretic functional $F(oldsymbol{\omega}, \boldsymbol{\mu})$ and an associated optimal cost proportion $C^{*}(\boldsymbol{\mu})$, then proposes MF-GRAD, a gradient-based algorithm that achieves a matching upper bound asymptotically. MF-GRAD employs a sub-gradient ascent on cost proportions, converts them to pull-proportions, and uses a GLR stopping rule to guarantee $\,\delta$-correct identification of the best arm at the highest fidelity, with per-iteration complexity $O(K^{2}M^{2})$. Theoretical guarantees show that MF-GRAD is asymptotically optimal in the high-confidence regime, and experiments demonstrate superior empirical cost efficiency over prior MF-BAI approaches, along with insights into the sparsity of optimal fidelities in small MF settings. A key finding is the sparsity pattern where, in many $2\times M$ problems, each arm’s optimal fidelity is concentrated on a single fidelity, informing design of multi-fidelity sampling strategies.
Abstract
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm. Our first contribution is a tight, instance-dependent lower bound on the cost complexity. The study of the optimization problem featured in the lower bound provides new insights to devise computationally efficient algorithms, and leads us to propose a gradient-based approach with asymptotically optimal cost complexity. We demonstrate the benefits of the new algorithm compared to existing methods in experiments. Our theoretical and empirical findings also shed light on an intriguing concept of optimal fidelity for each arm.
