Table of Contents
Fetching ...

EVaR-Optimal Arm Identification in Bandits

Mehrasa Ahmadipour, Aurélien Garivier

TL;DR

This paper advances risk-averse best-arm identification by studying EVaR-based BAI in nonparametric bandits with rewards in [0,1]. It introduces a Track-and-Stop algorithm that is δ-correct and achieves an asymptotically optimal sample complexity, matching a derived EVaR-informed information lower bound. Central to the approach are two KL projection functionals, KL_inf^U and KL_inf^L, derived from EVaR dual representations, which drive both the lower bound and the sampling/stopping rules. The results enable principled, distribution-free risk-averse decision-making in sequential settings and pave the way for a unified framework across coherent risk measures in bandit problems.

Abstract

We study the fixed-confidence best arm identification (BAI) problem within the multi-armed bandit (MAB) framework under the Entropic Value-at-Risk (EVaR) criterion. Our analysis considers a nonparametric setting, allowing for general reward distributions bounded in [0,1]. This formulation addresses the critical need for risk-averse decision-making in high-stakes environments, such as finance, moving beyond simple expected value optimization. We propose a $δ$-correct, Track-and-Stop based algorithm and derive a corresponding lower bound on the expected sample complexity, which we prove is asymptotically matched. The implementation of our algorithm and the characterization of the lower bound both require solving a complex convex optimization problem and a related, simpler non-convex one.

EVaR-Optimal Arm Identification in Bandits

TL;DR

This paper advances risk-averse best-arm identification by studying EVaR-based BAI in nonparametric bandits with rewards in [0,1]. It introduces a Track-and-Stop algorithm that is δ-correct and achieves an asymptotically optimal sample complexity, matching a derived EVaR-informed information lower bound. Central to the approach are two KL projection functionals, KL_inf^U and KL_inf^L, derived from EVaR dual representations, which drive both the lower bound and the sampling/stopping rules. The results enable principled, distribution-free risk-averse decision-making in sequential settings and pave the way for a unified framework across coherent risk measures in bandit problems.

Abstract

We study the fixed-confidence best arm identification (BAI) problem within the multi-armed bandit (MAB) framework under the Entropic Value-at-Risk (EVaR) criterion. Our analysis considers a nonparametric setting, allowing for general reward distributions bounded in [0,1]. This formulation addresses the critical need for risk-averse decision-making in high-stakes environments, such as finance, moving beyond simple expected value optimization. We propose a -correct, Track-and-Stop based algorithm and derive a corresponding lower bound on the expected sample complexity, which we prove is asymptotically matched. The implementation of our algorithm and the characterization of the lower bound both require solving a complex convex optimization problem and a related, simpler non-convex one.

Paper Structure

This paper contains 29 sections, 21 theorems, 132 equations, 1 figure.

Key Result

proposition 1

Let $\{P_\theta\}_{\theta\in(a,b)}$ be a one–parameter canonical exponential family with log-partition $\psi$ (strictly convex). For $\alpha\in(0,1)$, write Then $\theta\mapsto \mathrm{EVaR}_\alpha(P_\theta)$ is strictly increasing. Consequently, since $\mu(\theta)=\psi'(\theta)$ is strictly increasing, the arm minimizing $\mathrm{EVaR}_\alpha$ coincides with the arm minimizing the mean.

Figures (1)

  • Figure 1: Schematic density on $[0,1]$ with right tail mass $1-\alpha$ shaded. Markers show $\mathbb{E}[X]$, $\mathrm{VaR}_\alpha(X)$, $\mathrm{CVaR}_\alpha(X)$, and $\mathrm{EVaR}_\alpha(X)$.

Theorems & Definitions (43)

  • proposition 1
  • proof
  • remark 1
  • lemma 1
  • corollary 1
  • proof
  • lemma 2
  • definition 1: Lower semicontinuity
  • remark 2
  • lemma 3
  • ...and 33 more