Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

Jingwei Ji; Renyuan Xu; Ruihao Zhu

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

Jingwei Ji, Renyuan Xu, Ruihao Zhu

TL;DR

The paper tackles risk aware decision making in linear bandits by introducing mean-varianceMV regret optimization for large action spaces, with smart order routing as a motivating application. It proposes two algorithms, RISe and RISe++, that leverage variance minimizing G-optimal design to efficiently explore and then exploit, achieving near optimal regret guarantees. A novel temporal regret decomposition and phased elimination underpin the theory, enabling decoupling of horizon dependence from action count and providing both instance independent and instance dependent guarantees. Empirical validation on synthetic data and Nasdaq ITCH data demonstrates the practical viability of the linear mean-variance modeling and substantial regret improvements in SOR tasks.

Abstract

Motivated by practical considerations in machine learning for financial decision-making, such as risk aversion and large action space, we consider risk-aware bandits optimization with applications in smart order routing (SOR). Specifically, based on preliminary observations of linear price impacts made from the NASDAQ ITCH dataset, we initiate the study of risk-aware linear bandits. In this setting, we aim at minimizing regret, which measures our performance deficit compared to the optimum's, under the mean-variance metric when facing a set of actions whose rewards are linear functions of (initially) unknown parameters. Driven by the variance-minimizing globally-optimal (G-optimal) design, we propose the novel instance-independent Risk-Aware Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware Successive Elimination (RISE++) algorithm. Then, we rigorously analyze their near-optimal regret upper bounds to show that, by leveraging the linear structure, our algorithms can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the algorithms by conducting extensive numerical experiments in the SOR setup using both synthetic datasets and the NASDAQ ITCH dataset. Our results reveal that 1) The linear structure assumption can indeed be well supported by the Nasdaq dataset; and more importantly 2) Both RISE and RISE++ can significantly outperform the competing methods, in terms of regret, especially in complex decision-making scenarios.

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

TL;DR

Abstract

Paper Structure (23 sections, 9 theorems, 66 equations, 5 figures, 1 table, 3 algorithms)

This paper contains 23 sections, 9 theorems, 66 equations, 5 figures, 1 table, 3 algorithms.

Introduction
Related Literature
Organization
Problem Formulation: Linear Bandit
Risk-Aware Explore-then-Commit (RISE) Algorithm for Linear Bandit
The Algorithm
Regret Analysis
Risk-Aware Successive Elimination (RISE++) Algorithm
The Algorithm
Regret Analysis
Application: Smart Order Routing
Modeling SOR in the Framework of Linear Bandits
A Synthetic Example
A Numerical Study Based on the NASDAQ ITCH Dataset
Conclusion
...and 8 more sections

Key Result

Proposition 1

For any fixed $n$ and $\varepsilon,\delta > 0$, the algorithm RISE ensures

Figures (5)

Figure 1: The estimated means and variances of PnL for Amazon at NASDAQ exchange, at various test quantity to liquidate at each time point. This is an empirical justification for our linear form approximation.
Figure 2: Intermediate regrets of different algorithms over $T$. RISE and RISE++ outperform the benchmark algorithms.
Figure 3: Visualization of the sampling procedure. We sample one snapshot of the LOB from each bucket uniformly over time. Then we obtain the empirical mean and variance PnL (cf. Eq. \ref{['eq:estimation_p']}) over 60 buckets.
Figure 4: Intermediate regret. RISE and RISE++ exhibit sublinear regret in the experiment based on the NASDAQ ITCH dataset.
Figure EC.1: The estimated means and variances of PnL for Tesla stock at different venues, as $Q^j$ varies. A clear linear relationship prevails for mean and a linear relationship for variance roughly holds. This is an empirical justification for our linear form approximation Eq. \ref{['eq:sor_mean_model']} and Eq. \ref{['eq:sor_var_model']}.

Theorems & Definitions (15)

Remark 1: Importance of the G-Optimal Design
Proposition 1
Proposition 2: Temporal Decomposition of Mean-Variance Regret
Theorem 1
Remark 2: Technical Difference Compared to Existing Works
Remark 3: Near-Optimal Regret Upper Bound and the Advantages
Remark 4: Comparisons with the Design of RISE
Proposition 3: Lemma 2 in vakili2016
Proposition 4
Proposition 5
...and 5 more

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

TL;DR

Abstract

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (15)