Reinforcement Learning for Trade Execution with Market and Limit Orders
Patrick Cheridito, Moritz Weiss
TL;DR
The paper introduces a logistic-normal actor-critic reinforcement learning framework for optimal trade execution in limit order books, framing order placement as a dynamic allocation across market and multiple limit levels. By modeling actions on the simplex and ensuring feasible allocations with a logit-normal transform, the approach can handle high-dimensional state and action spaces while capturing both direct and indirect market impact via interacting market participants. Empirical results in simulated markets with noise, tactical, and strategic traders show the logistic-normal policy often outperforms heuristic strategies and a Dirichlet-based RL baseline, with robust performance across horizons and position sizes. The contribution offers a scalable method for sophisticated execution tasks and suggests broad applicability to other dynamic allocation problems beyond trading.
Abstract
In this paper, we introduce a novel reinforcement learning framework for optimal trade execution in a limit order book. We formulate the trade execution problem as a dynamic allocation task whose objective is the optimal placement of market and limit orders to maximize expected revenue. By modeling market and limit order allocations with multivariate logistic-normal distributions, the framework enables efficient training of the reinforcement learning algorithm. Numerical experiments show that the proposed method outperforms traditional benchmark strategies in simulated limit order book environments featuring noise traders submitting random orders, tactical traders responding to order book imbalances, and a strategic trader seeking to acquire or liquidate an asset position.
