Table of Contents
Fetching ...

Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

Shunta Nonaga, Koji Tabata, Yuta Mizuno, Tamiki Komatsuzaki

TL;DR

This work tackles risk-aware best-arm set identification under mean-variance with fixed-budget and fixed-confidence regimes. It introduces RAMGapE, a unified gap-based exploration framework that identifies an $\epsilon$-Pareto set by balancing mean performance and risk via $\xi_i=\alpha(\sigma_i^2-\rho\mu_i)$ and a principled dialog between uncertainty and Pareto dominance. Theoretical guarantees establish finite-time stopping and accuracy bounds, while extensive experiments demonstrate superior sample efficiency and frontier-focused exploration compared with strong baselines. The approach unifies settings, handles unknown Pareto-set size, and offers practical gains for risk-sensitive decision-making in uncertain environments.

Abstract

Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the mean-variance(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration strategy. We provide theoretical guarantees on the correctness of the returned solutions in both settings. To complement this theoretical analysis, we conduct extensive empirical evaluations across synthetic benchmarks, demonstrating that our approach outperforms existing methods in terms of both accuracy and sample efficiency, highlighting its broad applicability to risk-aware decision-making tasks in uncertain environments.

Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

TL;DR

This work tackles risk-aware best-arm set identification under mean-variance with fixed-budget and fixed-confidence regimes. It introduces RAMGapE, a unified gap-based exploration framework that identifies an -Pareto set by balancing mean performance and risk via and a principled dialog between uncertainty and Pareto dominance. Theoretical guarantees establish finite-time stopping and accuracy bounds, while extensive experiments demonstrate superior sample efficiency and frontier-focused exploration compared with strong baselines. The approach unifies settings, handles unknown Pareto-set size, and offers practical gains for risk-sensitive decision-making in uncertain environments.

Abstract

Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the mean-variance(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration strategy. We provide theoretical guarantees on the correctness of the returned solutions in both settings. To complement this theoretical analysis, we conduct extensive empirical evaluations across synthetic benchmarks, demonstrating that our approach outperforms existing methods in terms of both accuracy and sample efficiency, highlighting its broad applicability to risk-aware decision-making tasks in uncertain environments.

Paper Structure

This paper contains 20 sections, 12 theorems, 53 equations, 9 figures, 2 tables, 9 algorithms.

Key Result

theorem 1

(Error Bound for Fixed-Budget) If we run RAMGapEb with parameter $0<a\leq\frac{n-2K}{16K}\epsilon^2$ for rounds $n$, total number of arms $K$ and allowance rate $\epsilon$, its simple regret $r_{\widehat{D}^+_n}$ satisfies and, in particular, this probability is minimized at $a=\frac{n-2K}{16K}\epsilon^2$.

Figures (9)

  • Figure 1: Stopping Time Comparison of Experiment 1 with $\epsilon=0.1$. The blue dashed line corresponds to the identity line, i.e., the set of points where both methods terminate at the same time, indicating comparable performance. Points located below this line signify that the proposed method stops earlier than the baseline.
  • Figure 2: Visualization of confidence intervals at stopping time (Experiment 2.1). Each panel shows the empirical mean (horizontal axis) and scaled risk (vertical axis: $\xi = \alpha(\sigma^2 - \rho \mu)$) of each arm at the termination round for different algorithms. The crosses represent confidence intervals of each arm; the longer the arms of the crossed interval, the fewer the samples allocated to that arm. Red points (=crosses) indicate arms included in the returned set $\widehat{D}^+_t$, while blue points indicate excluded arms. RAMGapEc and RA-LUCB not only avoid over-sampling non-Pareto arms (shown in blue), but also limit sampling for some arms included in $\widehat{D}^+_t$, particularly those located on the far right of the plot (i.e., arms with high mean but less impact on Pareto set boundaries). These arms exhibit wider confidence intervals, reflecting lower sample counts. This behavior highlights the algorithms’ ability to allocate samples efficiently, gathering just enough information for confident identification without unnecessary exploration. The total sample counts of these examples are: RAMGapEc: 9,697,292; RA-LUCB: 9,728,010; DE Round-Robin: 15,283,296; Round-Robin: 43,548,822.
  • Figure 3: Visualization of confidence intervals at stopping time (Experiment 2.2). The meanings of the crosses, and the colors are the same as in Fig. \ref{['fig:experiment2_1']}. The total number of samples used for these examples are: RAMGapEc: 28,261,200; RA-LUCB: 28,486,332; DE Round-Robin: 30,041,447; Round-Robin: 46,905,293.
  • Figure 4: Comparison of average simple regret (middle 50%) over total number of samples.
  • Figure 5: Geometric illustration of gap quantities in the mean-risk space. Each point represents an arm, plotted by its expected reward on the horizontal axis ($\mu$) and scaled mean-variance risk on the vertical axis ($\xi := \alpha(\sigma^2 - \rho\mu)$). Here suppose that arms $i$ and $j$ are Pareto optimal while arm $k$ is non-Pareto, suboptimal. In both panels, arm $i$ has lower risk but a smaller mean than that of arm $k$; the opposite case (i.e., arm $i$ has a higher mean but a greater risk than those of arm $k$) can be treated similarly. If arm $i$ moves upward by more than $M(i,k)$ (or if $k$ moves downward), arm $i$ would become dominated by arm $k$. In these illustrations, the identity $M(i,k) = M(i,j) + (\xi_k - \xi_j)$ holds. Let us suppose that the gap of arm $i$ satisfies $\Delta_i = \min \left( \min(M(i,j), M(j,i)), M(k,i)^+ + \Delta_k \right)$ under the existence of other possible Pareto and non-Pareto arms. Panel (a): When $M(k,i) < M(i,j)$, we obtain $M(i,k) = M(i,j) + (\xi_k - \xi_j) > M(k,i) + (\xi_k - \xi_j)=M^+(k,i) + \Delta_k$. The smaller $M(i,k)$ is, the more the samplings from arms $i$ and $k$ are required to discriminate them in practice. The choice of $M^+(k,i) + \Delta_k$---the lower bound of $M(i,k)$--- as the gap $\Delta_i$ corresponds to a "conservative" estimate reflecting not only the suboptimal arm $k$ but also other Pareto arms $j$ via Eq. (\ref{['eq:def_gap']}). Panel (b): When $M(k,i) \geq M(i,j)$, the term $\min(M(i,j), M(j,i))$ dominates the expression of $\Delta_i$, indicating that the difficulty in distinguishing $i$ from another Pareto-optimal arm $j$ outweighs that from suboptimal arm $k$. Thus, the contribution of $k$ to $\Delta_i$ becomes negligible in this case.
  • ...and 4 more figures

Theorems & Definitions (23)

  • definition 1: Empirical Dominance Relation
  • theorem 1
  • theorem 2
  • lemma 1
  • proof
  • lemma 2
  • proof
  • lemma 3
  • proof
  • theorem 2
  • ...and 13 more