Table of Contents
Fetching ...

Minimal Shortfall Strategies for Liquidation of a Basket of Stocks using Reinforcement Learning

Moustapha Pemy, Na Zhang

TL;DR

The paper tackles the problem of liquidating a basket of $n$ highly correlated stocks within a finite horizon by minimizing the overall execution shortfall $F_k$, modeled as a discrete-time stochastic control problem with VWAP-based rewards. It introduces a reinforcement-learning–driven minimal shortfall algorithm that simultaneously trains two neural networks to capture inter-stock dependencies under a linear market-impact assumption $f_i(x)=\lambda_i x$, using a policy gradient approach with a Gibbs policy $\pi(s,\iota,a,\theta)$ and a linear $\hat{Q}^\pi_\omega$, and proves convergence of the average-reward gradient $\lim_{k\to\infty} \nabla_\theta \rho(\pi_k)=0$. The approach is instantiated via an actor-critic architecture, with $Q$-function approximation and a training protocol that stores transitions $\epsilon_k=(x_k,a_k,F_k,x_{k+1})$ in a replay buffer and updates from randomly sampled data. Empirical evaluation on intraday data for six stocks (AAPL, GOOG, IBM, T, VZ, XOM) demonstrates the method’s ability to drive tracking error and shortfall toward zero while respecting action bounds, supporting practical deployment in high-dimensional liquidations. Overall, the work contributes a scalable, theory-backed RL framework for multidimensional liquidation that leverages correlation, provides convergence guarantees, and demonstrates real-market applicability.

Abstract

This paper studies the ubiquitous problem of liquidating large quantities of highly correlated stocks, a task frequently encountered by institutional investors and proprietary trading firms. Traditional methods in this setting suffer from the curse of dimensionality, making them impractical for high-dimensional problems. In this work, we propose a novel method based on stochastic optimal control to optimally tackle this complex multidimensional problem. The proposed method minimizes the overall execution shortfall of highly correlated stocks using a reinforcement learning approach. We rigorously establish the convergence of our optimal trading strategy and present an implementation of our algorithm using intra-day market data.

Minimal Shortfall Strategies for Liquidation of a Basket of Stocks using Reinforcement Learning

TL;DR

The paper tackles the problem of liquidating a basket of highly correlated stocks within a finite horizon by minimizing the overall execution shortfall , modeled as a discrete-time stochastic control problem with VWAP-based rewards. It introduces a reinforcement-learning–driven minimal shortfall algorithm that simultaneously trains two neural networks to capture inter-stock dependencies under a linear market-impact assumption , using a policy gradient approach with a Gibbs policy and a linear , and proves convergence of the average-reward gradient . The approach is instantiated via an actor-critic architecture, with -function approximation and a training protocol that stores transitions in a replay buffer and updates from randomly sampled data. Empirical evaluation on intraday data for six stocks (AAPL, GOOG, IBM, T, VZ, XOM) demonstrates the method’s ability to drive tracking error and shortfall toward zero while respecting action bounds, supporting practical deployment in high-dimensional liquidations. Overall, the work contributes a scalable, theory-backed RL framework for multidimensional liquidation that leverages correlation, provides convergence guarantees, and demonstrates real-market applicability.

Abstract

This paper studies the ubiquitous problem of liquidating large quantities of highly correlated stocks, a task frequently encountered by institutional investors and proprietary trading firms. Traditional methods in this setting suffer from the curse of dimensionality, making them impractical for high-dimensional problems. In this work, we propose a novel method based on stochastic optimal control to optimally tackle this complex multidimensional problem. The proposed method minimizes the overall execution shortfall of highly correlated stocks using a reinforcement learning approach. We rigorously establish the convergence of our optimal trading strategy and present an implementation of our algorithm using intra-day market data.

Paper Structure

This paper contains 8 sections, 1 theorem, 32 equations, 3 figures, 1 algorithm.

Key Result

Theorem 3.1

Given any initial parameter vector $\theta_0$, the sequence of average reward $(\rho(\pi_k))_{k \in \mathbb{N}}$, with $\pi_k=\pi(\cdot, \cdot, \theta_k)$ and and is such that

Figures (3)

  • Figure 1: The Tracking Errors for Apple, Google, IBM, ATT, Verizon, and Exxon Mobil on September 8, 2017
  • Figure 2: The Expected Shortfalls for Apple, Google, IBM, ATT, Verizon, and Exxon Mobil on Sept. 8, 2017
  • Figure 3: The Tracking Error for Apple, Google, IBM, ATT, Verizon, and Exxon Mobil at certain action on Sept. 8, 2017

Theorems & Definitions (3)

  • Definition 2.1
  • Definition 3.1
  • Theorem 3.1