Table of Contents
Fetching ...

Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning

Sebastien Lleo, Wolfgang Runggaldier

TL;DR

This work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods and induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency.

Abstract

This paper bridges reinforcement learning (RL) and risk-sensitive stochastic control by introducing a tractable exploration mechanism for policy search in risk-sensitive portfolio management, with known and unknown model parameters, that yields an endogenous relative-entropy regularization. We construct a discrete-time risk-sensitive benchmarked investment model. This model combines a factor-based asset universe with periodic portfolio rebalancing. Exploration is incorporated through user-specified Gaussian perturbations to baseline (exploitative) controls. The risk-sensitive stochastic control problem is solved analytically using the Free Energy-Entropy Duality. The Duality recasts the control problem as a linear-quadratic-Gaussian game and introduces a natural penalty for exploration. This approach yields simple sufficiency conditions for optimality. It also induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency. Additionally, the optimal investment strategy can be interpreted through the lens of fractional Kelly strategies. By connecting risk-sensitive control theory and RL, this work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods.

Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning

TL;DR

This work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods and induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency.

Abstract

This paper bridges reinforcement learning (RL) and risk-sensitive stochastic control by introducing a tractable exploration mechanism for policy search in risk-sensitive portfolio management, with known and unknown model parameters, that yields an endogenous relative-entropy regularization. We construct a discrete-time risk-sensitive benchmarked investment model. This model combines a factor-based asset universe with periodic portfolio rebalancing. Exploration is incorporated through user-specified Gaussian perturbations to baseline (exploitative) controls. The risk-sensitive stochastic control problem is solved analytically using the Free Energy-Entropy Duality. The Duality recasts the control problem as a linear-quadratic-Gaussian game and introduces a natural penalty for exploration. This approach yields simple sufficiency conditions for optimality. It also induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency. Additionally, the optimal investment strategy can be interpreted through the lens of fractional Kelly strategies. By connecting risk-sensitive control theory and RL, this work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods.
Paper Structure (21 sections, 12 theorems, 129 equations, 1 table)

This paper contains 21 sections, 12 theorems, 129 equations, 1 table.

Key Result

Proposition 2.5

Let $\psi$ be a measurable random variable such that $\mathbf{E}^{\mathbb{P}}[e^\psi] < \infty$. Then where the supremum is taken over all probability measures $\mathbb{P}^\gamma$ absolutely continuous with respect to $\mathbb{P}$.

Theorems & Definitions (38)

  • Definition 2.2: Randomized Exploratory Control
  • Remark 1
  • Definition 2.3: Free Energy
  • Definition 2.4: Relative Entropy / Kullback-Leibler Divergence
  • Proposition 2.5: Free Energy-Entropy Duality, daipraConnectionsStochasticControl1996
  • Remark 2
  • Definition 3.1: Class of admissible exploratory strategies $\mathcal{A}^{H}_{\mathrm{expl}}$
  • Remark 3: Interpretation of Exploration
  • Remark 4: Piecewise-constant process $\bar{\gamma}$
  • Definition 3.2: Admissible duality strategies $\mathcal{A}^{\bar{\Gamma}}$
  • ...and 28 more