Table of Contents
Fetching ...

Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms

Meltem Tatlı, Arpan Mukherjee, Prashanth L. A., Karthikeyan Shanmugam, Ali Tajer

TL;DR

The paper extends stochastic bandits to risk-sensitive objectives using distortion riskmetrics, revealing that optimal strategies can require mixing multiple arms rather than selecting a single best arm. It formalizes a DR-centric framework with rigorous assumptions and Hölder continuity conditions, and develops two algorithmic families, RS-ETC-M and RS-UCB-M, plus a computationally efficient CE-UCB-M, to learn and track optimal arm mixtures. The study provides ε-dependent and ε-independent regret guarantees across a broad class of distortion riskmetrics, including non-monotone measures like the Gini deviation, and demonstrates empirical benefits of mixture-based policies over solitary-arm approaches. The results illuminate how risk considerations fundamentally alter bandit design and offer regret-efficient strategies for practical risk-aware decision-making. The work has potential impact on portfolio selection, risk budgeting, and any domain where tail behavior and distributional risk are critical.

Abstract

This paper introduces a general framework for risk-sensitive bandits that integrates the notions of risk-sensitive objectives by adopting a rich class of distortion riskmetrics. The introduced framework subsumes the various existing risk-sensitive models. An important and hitherto unknown observation is that for a wide range of riskmetrics, the optimal bandit policy involves selecting a mixture of arms. This is in sharp contrast to the convention in the multi-arm bandit algorithms that there is generally a solitary arm that maximizes the utility, whether purely reward-centric or risk-sensitive. This creates a major departure from the principles for designing bandit algorithms since there are uncountable mixture possibilities. The contributions of the paper are as follows: (i) it formalizes a general framework for risk-sensitive bandits, (ii) identifies standard risk-sensitive bandit models for which solitary arm selections is not optimal, (iii) and designs regret-efficient algorithms whose sampling strategies can accurately track optimal arm mixtures (when mixture is optimal) or the solitary arms (when solitary is optimal). The algorithms are shown to achieve a regret that scales according to $O((\log T/T )^ν)$, where $T$ is the horizon, and $ν>0$ is a riskmetric-specific constant.

Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms

TL;DR

The paper extends stochastic bandits to risk-sensitive objectives using distortion riskmetrics, revealing that optimal strategies can require mixing multiple arms rather than selecting a single best arm. It formalizes a DR-centric framework with rigorous assumptions and Hölder continuity conditions, and develops two algorithmic families, RS-ETC-M and RS-UCB-M, plus a computationally efficient CE-UCB-M, to learn and track optimal arm mixtures. The study provides ε-dependent and ε-independent regret guarantees across a broad class of distortion riskmetrics, including non-monotone measures like the Gini deviation, and demonstrates empirical benefits of mixture-based policies over solitary-arm approaches. The results illuminate how risk considerations fundamentally alter bandit design and offer regret-efficient strategies for practical risk-aware decision-making. The work has potential impact on portfolio selection, risk budgeting, and any domain where tail behavior and distributional risk are critical.

Abstract

This paper introduces a general framework for risk-sensitive bandits that integrates the notions of risk-sensitive objectives by adopting a rich class of distortion riskmetrics. The introduced framework subsumes the various existing risk-sensitive models. An important and hitherto unknown observation is that for a wide range of riskmetrics, the optimal bandit policy involves selecting a mixture of arms. This is in sharp contrast to the convention in the multi-arm bandit algorithms that there is generally a solitary arm that maximizes the utility, whether purely reward-centric or risk-sensitive. This creates a major departure from the principles for designing bandit algorithms since there are uncountable mixture possibilities. The contributions of the paper are as follows: (i) it formalizes a general framework for risk-sensitive bandits, (ii) identifies standard risk-sensitive bandit models for which solitary arm selections is not optimal, (iii) and designs regret-efficient algorithms whose sampling strategies can accurately track optimal arm mixtures (when mixture is optimal) or the solitary arms (when solitary is optimal). The algorithms are shown to achieve a regret that scales according to , where is the horizon, and is a riskmetric-specific constant.

Paper Structure

This paper contains 60 sections, 22 theorems, 183 equations, 2 figures, 3 tables, 3 algorithms.

Key Result

Lemma 1

Consider a two-arm Bernoulli bandit model. For a given $p\in[0,1]$, the arms' distributions are ${\rm Bern}(p)$ and ${\rm Bern}(1-p)$. For distortion function $h(u)=u(1-u)$, we have

Figures (2)

  • Figure 1: Regret of the algorithms for different parameters
  • Figure 2: Regrets of algorithms for different settings.

Theorems & Definitions (39)

  • Lemma 1: Gini Deviation
  • Definition 1: Hölder continuity
  • Definition 2: Mixture Hölder continuity
  • Lemma 2
  • proof
  • Theorem 1: RS-ETC-M -- $\varepsilon$-dependent
  • Theorem 2: RS-ETC-M -- $\varepsilon$-independent
  • Theorem 3: RS/CE-UCB-M -- $\varepsilon$-dependent
  • Theorem 4: RS/CE-UCB-M -- $\varepsilon$-independent
  • Theorem 5: villani2009optimal
  • ...and 29 more