Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution
Tomas Espana, Yadh Hafsi, Fabrizio Lillo, Edoardo Vittori
TL;DR
This work tackles the optimal execution problem for large meta-orders by minimizing execution costs in a microstructure realistic setting. It introduces a model free reinforcement learning approach using a Queue-Reactive Model to supply counterfactual feedback in a market that exhibits endogenous price impact and liquidity replenishment. A Double Deep Q-Network learns a policy that is both strategic and tactical, adapting to real time liquidity and price dynamics and outperforming standard baselines such as TWAP and fixed percentage of posted volume strategies. The proposed framework demonstrates that model free RL can yield robust, adaptive execution strategies and offers a path toward extensions to multi asset and signal integrated execution pipelines.
Abstract
We investigate the use of Reinforcement Learning for the optimal execution of meta-orders, where the objective is to execute incrementally large orders while minimizing implementation shortfall and market impact over an extended period of time. Departing from traditional parametric approaches to price dynamics and impact modeling, we adopt a model-free, data-driven framework. Since policy optimization requires counterfactual feedback that historical data cannot provide, we employ the Queue-Reactive Model to generate realistic and tractable limit order book simulations that encompass transient price impact, and nonlinear and dynamic order flow responses. Methodologically, we train a Double Deep Q-Network agent on a state space comprising time, inventory, price, and depth variables, and evaluate its performance against established benchmarks. Numerical simulation results show that the agent learns a policy that is both strategic and tactical, adapting effectively to order book conditions and outperforming standard approaches across multiple training configurations. These findings provide strong evidence that model-free Reinforcement Learning can yield adaptive and robust solutions to the optimal execution problem.
