Minimal Shortfall Strategies for Liquidation of a Basket of Stocks using Reinforcement Learning
Moustapha Pemy, Na Zhang
TL;DR
The paper tackles the problem of liquidating a basket of $n$ highly correlated stocks within a finite horizon by minimizing the overall execution shortfall $F_k$, modeled as a discrete-time stochastic control problem with VWAP-based rewards. It introduces a reinforcement-learning–driven minimal shortfall algorithm that simultaneously trains two neural networks to capture inter-stock dependencies under a linear market-impact assumption $f_i(x)=\lambda_i x$, using a policy gradient approach with a Gibbs policy $\pi(s,\iota,a,\theta)$ and a linear $\hat{Q}^\pi_\omega$, and proves convergence of the average-reward gradient $\lim_{k\to\infty} \nabla_\theta \rho(\pi_k)=0$. The approach is instantiated via an actor-critic architecture, with $Q$-function approximation and a training protocol that stores transitions $\epsilon_k=(x_k,a_k,F_k,x_{k+1})$ in a replay buffer and updates from randomly sampled data. Empirical evaluation on intraday data for six stocks (AAPL, GOOG, IBM, T, VZ, XOM) demonstrates the method’s ability to drive tracking error and shortfall toward zero while respecting action bounds, supporting practical deployment in high-dimensional liquidations. Overall, the work contributes a scalable, theory-backed RL framework for multidimensional liquidation that leverages correlation, provides convergence guarantees, and demonstrates real-market applicability.
Abstract
This paper studies the ubiquitous problem of liquidating large quantities of highly correlated stocks, a task frequently encountered by institutional investors and proprietary trading firms. Traditional methods in this setting suffer from the curse of dimensionality, making them impractical for high-dimensional problems. In this work, we propose a novel method based on stochastic optimal control to optimally tackle this complex multidimensional problem. The proposed method minimizes the overall execution shortfall of highly correlated stocks using a reinforcement learning approach. We rigorously establish the convergence of our optimal trading strategy and present an implementation of our algorithm using intra-day market data.
