Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning

Sebastien Lleo; Wolfgang Runggaldier

Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning

Sebastien Lleo, Wolfgang Runggaldier

TL;DR

This work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods and induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency.

Abstract

This paper bridges reinforcement learning (RL) and risk-sensitive stochastic control by introducing a tractable exploration mechanism for policy search in risk-sensitive portfolio management, with known and unknown model parameters, that yields an endogenous relative-entropy regularization. We construct a discrete-time risk-sensitive benchmarked investment model. This model combines a factor-based asset universe with periodic portfolio rebalancing. Exploration is incorporated through user-specified Gaussian perturbations to baseline (exploitative) controls. The risk-sensitive stochastic control problem is solved analytically using the Free Energy-Entropy Duality. The Duality recasts the control problem as a linear-quadratic-Gaussian game and introduces a natural penalty for exploration. This approach yields simple sufficiency conditions for optimality. It also induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency. Additionally, the optimal investment strategy can be interpreted through the lens of fractional Kelly strategies. By connecting risk-sensitive control theory and RL, this work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods.

Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning

TL;DR

Abstract

Paper Structure (21 sections, 12 theorems, 129 equations, 1 table)

This paper contains 21 sections, 12 theorems, 129 equations, 1 table.

Introduction
Setting Up the Risk-Sensitive Benchmarked Asset Management Problem
Model for the Financial Market
Risk-Sensitive Control Problem
Free Energy, Relative Entropy, and the Energy-Entropy Duality
Solution Via The Free Energy-Entropy Duality
Exploratory Controls and the Log-Relative Return Process
Free Energy-Entropy Duality for the Randomized Risk-Sensitive Investment Management Problem
Saddle Point Representation and Optimal Controls for the Penalized Stochastic Game
Main Result
Limiting Case: The Kelly Criterion
Interpreting the Risk-Sensitive Investment Management Model
Interpreting Assumptions \ref{['as:sigma:posdef']} and \ref{['as:saddlepoint:cond']}
Optimality Conditions Implied by Proposition \ref{['prop:Controls:Alt']}
Optimal Investment Strategies as Kelly Strategies
...and 6 more sections

Key Result

Proposition 2.5

Let $\psi$ be a measurable random variable such that $\mathbf{E}^{\mathbb{P}}[e^\psi] < \infty$. Then where the supremum is taken over all probability measures $\mathbb{P}^\gamma$ absolutely continuous with respect to $\mathbb{P}$.

Theorems & Definitions (38)

Definition 2.2: Randomized Exploratory Control
Remark 1
Definition 2.3: Free Energy
Definition 2.4: Relative Entropy / Kullback-Leibler Divergence
Proposition 2.5: Free Energy-Entropy Duality, daipraConnectionsStochasticControl1996
Remark 2
Definition 3.1: Class of admissible exploratory strategies $\mathcal{A}^{H}_{\mathrm{expl}}$
Remark 3: Interpretation of Exploration
Remark 4: Piecewise-constant process $\bar{\gamma}$
Definition 3.2: Admissible duality strategies $\mathcal{A}^{\bar{\Gamma}}$
...and 28 more

Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning

TL;DR

Abstract

Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (38)