Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits

Sabrina Khurshid; Mohammed Shahid Abdulla; Gourab Ghatak

Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits

Sabrina Khurshid, Mohammed Shahid Abdulla, Gourab Ghatak

TL;DR

This work addresses online optimization of risk-adjusted performance by introducing Regularized Square Sharpe Ratio (RSSR) as a tractable surrogate for Sharpe Ratio in multi-armed bandits. It develops UCB-RSSR for regret minimization and three fixed-budget Best Arm Identification algorithms (SHVV, SHSR, SuRSR) to robustly identify the best RSSR/SR arms, supported by novel path-dependent concentration bounds for RSSR and SR-like statistics. Theoretical results show logarithmic regret for RM under RSSR and finite-sample error guarantees for BAI across multiple distributions, with empirical results demonstrating advantages over existing SR-oriented baselines such as U-UCB and MVTS/GRA-UCB. The findings have practical implications for risk-aware portfolio management and other domains where simultaneous control of return and volatility is essential.

Abstract

Sharpe Ratio (SR) is a critical parameter in characterizing financial time series as it jointly considers the reward and the volatility of any stock/portfolio through its variance. Deriving online algorithms for optimizing the SR is particularly challenging since even offline policies experience constant regret with respect to the best expert Even-Dar et al (2006). Thus, instead of optimizing the usual definition of SR, we optimize regularized square SR (RSSR). We consider two settings for the RSSR, Regret Minimization (RM) and Best Arm Identification (BAI). In this regard, we propose a novel multi-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR maximization. We derive a path-dependent concentration bound for the estimate of the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and show that it evolves as O(log n) for the two-armed bandit case played for a horizon n. We also consider a fixed budget setting for well-known BAI algorithms, i.e., sequential halving and successive rejects, and propose SHVV, SHSR, and SuRSR algorithms. We derive the upper bound for the error probability of all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the only other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023). We also establish its efficacy with respect to other benchmarks derived from the GRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed BAI algorithms for multiple different setups. Our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems. Consequently, our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems.

Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits

TL;DR

Abstract

Paper Structure (35 sections, 11 theorems, 73 equations, 3 figures, 6 algorithms)

This paper contains 35 sections, 11 theorems, 73 equations, 3 figures, 6 algorithms.

Introduction
Main Technical Challenges
Related Work
Contributions and organisation
Problem Formulation
Regret and Probability of error
Benchmarks
Variance Estimate and UCB-VV
Confidence Bound on the Estimation of Variance
Maximizing SR: UCB-SR-like
UCB-SR-like Algorithm
Maximizing SR: UCB-RSSR
UCB-RSSR Algorithm
Best Arm Identification
Sequential Halving for best variance identification
...and 20 more sections

Key Result

lemma 1

Let $X_1, X_2, \dots, X_n$ be a sequence of i.i.d. random variables bounded in $[0,u]$ with variance $\sigma^2$. Let $\bar{V}(n){\stackrel{\Delta}{=}} \frac{1}{n-1}\sum^{n}_{i=1}\left(X_i-\frac{1}{n}\sum_{j = 1}^nX_j\right)^2$ be the unbiased estimator of $\sigma^2$. Then,

Figures (3)

Figure 1: Comparison of $\texttt{UCB-RSSR}$ with $\texttt{U-UCB}$.
Figure 2: Expected sub-optimal plays v/s Time steps for (a) uniform distribution, (b) truncated Gaussian, (c) truncated gamma, and (d) Gaussian with $[l,u]$
Figure 3: (a) Error probability $e_n$ of $\texttt{SHVV}$ for 5 experiments given in \ref{['subsub:shvv']} (b - f) Error probability $e_n$ of SHSR, SuRSR, and uniform sampling algorithms for 5 experimental setups defined in \ref{['subsub:3_algos']}

Theorems & Definitions (31)

Remark 1
lemma 1
proof
Theorem 1: Regret
proof
Theorem 2: Bound
proof
Theorem 3: Regret
proof
Theorem 4: Bound
...and 21 more

Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits

TL;DR

Abstract

Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (31)