Table of Contents
Fetching ...

Reinforcement Learning for Trade Execution with Market and Limit Orders

Patrick Cheridito, Moritz Weiss

TL;DR

The paper introduces a logistic-normal actor-critic reinforcement learning framework for optimal trade execution in limit order books, framing order placement as a dynamic allocation across market and multiple limit levels. By modeling actions on the simplex and ensuring feasible allocations with a logit-normal transform, the approach can handle high-dimensional state and action spaces while capturing both direct and indirect market impact via interacting market participants. Empirical results in simulated markets with noise, tactical, and strategic traders show the logistic-normal policy often outperforms heuristic strategies and a Dirichlet-based RL baseline, with robust performance across horizons and position sizes. The contribution offers a scalable method for sophisticated execution tasks and suggests broad applicability to other dynamic allocation problems beyond trading.

Abstract

In this paper, we introduce a novel reinforcement learning framework for optimal trade execution in a limit order book. We formulate the trade execution problem as a dynamic allocation task whose objective is the optimal placement of market and limit orders to maximize expected revenue. By modeling market and limit order allocations with multivariate logistic-normal distributions, the framework enables efficient training of the reinforcement learning algorithm. Numerical experiments show that the proposed method outperforms traditional benchmark strategies in simulated limit order book environments featuring noise traders submitting random orders, tactical traders responding to order book imbalances, and a strategic trader seeking to acquire or liquidate an asset position.

Reinforcement Learning for Trade Execution with Market and Limit Orders

TL;DR

The paper introduces a logistic-normal actor-critic reinforcement learning framework for optimal trade execution in limit order books, framing order placement as a dynamic allocation across market and multiple limit levels. By modeling actions on the simplex and ensuring feasible allocations with a logit-normal transform, the approach can handle high-dimensional state and action spaces while capturing both direct and indirect market impact via interacting market participants. Empirical results in simulated markets with noise, tactical, and strategic traders show the logistic-normal policy often outperforms heuristic strategies and a Dirichlet-based RL baseline, with robust performance across horizons and position sizes. The contribution offers a scalable method for sophisticated execution tasks and suggests broad applicability to other dynamic allocation problems beyond trading.

Abstract

In this paper, we introduce a novel reinforcement learning framework for optimal trade execution in a limit order book. We formulate the trade execution problem as a dynamic allocation task whose objective is the optimal placement of market and limit orders to maximize expected revenue. By modeling market and limit order allocations with multivariate logistic-normal distributions, the framework enables efficient training of the reinforcement learning algorithm. Numerical experiments show that the proposed method outperforms traditional benchmark strategies in simulated limit order book environments featuring noise traders submitting random orders, tactical traders responding to order book imbalances, and a strategic trader seeking to acquire or liquidate an asset position.

Paper Structure

This paper contains 54 sections, 32 equations, 13 figures, 11 tables, 1 algorithm.

Figures (13)

  • Figure 1: Limit order book with the algorithm's orders in orange. Limit buy and sell orders are colored in blue and red, respectively. The algorithm takes action $a(t).$ The left panel shows the order book before the action. The right panel shows the order book after the action. The crosses indicate order cancellations (for the orange orders) or market order fills (for the blue orders).
  • Figure 2: The black solid lines are the best bid and ask prices. The triangles indicate buy and sell market orders. The size of the triangles corresponds to the size of the market orders. The colors indicate volumes. Darker colors correspond to larger volumes. Buy limit orders have a blue color, and sell limit orders have a red color.
  • Figure 3: Mid price evolutions after a market or limit order of size 10, 20, or 60 lots is placed at time $t=0.$ The upper panel shows mid price evolutions after a market order was placed. The lower panel shows mid price evolutions after a limit order was placed. The legend indicates the order sizes in lots. Deeper color tones correspond to larger order sizes.
  • Figure 4: The different panels show reward distributions of the submit and leave algorithm (SL), the time-weighted average price algorithm (TWAP), the logistic-normal algorithm (LN) and the Dirichlet algorithm (DR) for three simulated markets. The first row corresponds to 20 lots, and the second to 60 lots.
  • Figure 5: Average episode return per batch during training for different environments, initial position sizes, and algorithms. The x-axis shows the number of gradient steps. The top row shows the average returns for 20 lots. The bottom row shows the average returns for 60 lots. The green line corresponds to the LN algorithm, and the red line corresponds to the DR algorithm.
  • ...and 8 more figures

Theorems & Definitions (4)

  • Remark 3.1
  • Remark 4.1
  • Remark 4.2
  • Remark 4.3