Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing
Jingwei Ji, Renyuan Xu, Ruihao Zhu
TL;DR
The paper tackles risk aware decision making in linear bandits by introducing mean-varianceMV regret optimization for large action spaces, with smart order routing as a motivating application. It proposes two algorithms, RISe and RISe++, that leverage variance minimizing G-optimal design to efficiently explore and then exploit, achieving near optimal regret guarantees. A novel temporal regret decomposition and phased elimination underpin the theory, enabling decoupling of horizon dependence from action count and providing both instance independent and instance dependent guarantees. Empirical validation on synthetic data and Nasdaq ITCH data demonstrates the practical viability of the linear mean-variance modeling and substantial regret improvements in SOR tasks.
Abstract
Motivated by practical considerations in machine learning for financial decision-making, such as risk aversion and large action space, we consider risk-aware bandits optimization with applications in smart order routing (SOR). Specifically, based on preliminary observations of linear price impacts made from the NASDAQ ITCH dataset, we initiate the study of risk-aware linear bandits. In this setting, we aim at minimizing regret, which measures our performance deficit compared to the optimum's, under the mean-variance metric when facing a set of actions whose rewards are linear functions of (initially) unknown parameters. Driven by the variance-minimizing globally-optimal (G-optimal) design, we propose the novel instance-independent Risk-Aware Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware Successive Elimination (RISE++) algorithm. Then, we rigorously analyze their near-optimal regret upper bounds to show that, by leveraging the linear structure, our algorithms can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the algorithms by conducting extensive numerical experiments in the SOR setup using both synthetic datasets and the NASDAQ ITCH dataset. Our results reveal that 1) The linear structure assumption can indeed be well supported by the Nasdaq dataset; and more importantly 2) Both RISE and RISE++ can significantly outperform the competing methods, in terms of regret, especially in complex decision-making scenarios.
