Optimal Execution with Reinforcement Learning
Yadh Hafsi, Edoardo Vittori
TL;DR
This work tackles optimal execution of large trades within a finite horizon by learning execution policies through reinforcement learning inside a multi-agent limit order book simulator ABIDES. It formulates the problem as a finite-horizon MDP and adopts a Deep Q-Network (DQN) to learn strategies that balance implementation shortfall and market impact over a 30-minute window with 1-second control. The approach integrates strategic scheduling with tactical order placement by leveraging rich LOB features and endogenously generated market impacts, demonstrating superior performance over TWAP, Passive, and Random baselines with reduced variance. The findings suggest a practical, RL-based framework for real-world high-frequency execution that adapts to dynamic liquidity conditions and agent interactions; future work could explore more tractable simulators to broaden benchmarking.
Abstract
This study investigates the development of an optimal execution strategy through reinforcement learning, aiming to determine the most effective approach for traders to buy and sell inventory within a finite time horizon. Our proposed model leverages input features derived from the current state of the limit order book and operates at a high frequency to maximize control. To simulate this environment and overcome the limitations associated with relying on historical data, we utilize the multi-agent market simulator ABIDES, which provides a diverse range of depth levels within the limit order book. We present a custom MDP formulation followed by the results of our methodology and benchmark the performance against standard execution strategies. Results show that the reinforcement learning agent outperforms standard strategies and offers a practical foundation for real-world trading applications.
