Entropy-Regularized Mean-Variance Portfolio Optimization with Jumps
Christian Bender, Nguyen Tran Thuan
TL;DR
This paper addresses risk-aware portfolio optimization under jumps by introducing entropy-regularized exploratory controls. It constructs a continuous-time exploratory SDE with Lévy jumps from a discrete-time randomized-control scheme and proves the optimal distributional control is Gaussian, yielding a linear, closed-form wealth SDE. The analysis combines a dynamic-programming/HJB-PIDE framework with a quadratic ansatz to obtain explicit forms for the optimal control and the Lagrange multiplier, and it characterizes the wealth dynamics across multidimensional jump-diffusion settings. A key technical contribution is the weak convergence of discrete-time integrators to a limit SPDE-driven dynamics, which provides a rigorous basis for the RL-inspired exploration in continuous time and offers practical formulas for implementing exploration-regularized MV strategies in jump settings.
Abstract
Motivated by the trade-off between exploitation and exploration in reinforcement learning, we study a continuous-time entropy-regularized mean variance portfolio selection problem in the presence of jumps. We propose an exploratory SDE for the wealth process associated with multiple risky assets which exhibit Lévy jumps. In contrast to the existing literature, we study the limiting behavior of the natural discrete-time formulation of the wealth process associated to a randomized control in order to derive the continuous-time dynamics. We then show that an optimal distributional control of the continuous-time entropy-regularized exploratory mean-variance problem is Gaussian. The respective optimal wealth process solves a linear SDE whose representation is explicitly obtained.
