Table of Contents
Fetching ...

Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management

Gang Hu, Ming Gu

TL;DR

The paper presents a hybrid framework, Knowledge Distillation DDPG (KDD), which marries Markowitz mean-variance portfolio theory with reinforcement learning through a two-stage training process: supervised pretraining to imitate Markowitz allocations and subsequent reinforcement learning with DDPG in a market environment. By transferring knowledge from a teacher model to a student DDPG agent, KDD aims to achieve high returns with controlled risk and improved sample efficiency. Empirical results on Dow Jones 30 data (2009–2018) show KDD delivering superior total and annualized returns, a top risk-adjusted Sharpe ratio, and favorable risk metrics (e.g., Calmar, Alpha) relative to diverse baselines, including the DJI, Markowitz, and standard DDPG. The work highlights a promising direction for AI-driven portfolio management, balancing theoretical soundness with empirical performance, while noting limitations related to market regime shifts and backtesting-to-live deployment gaps.

Abstract

Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz's portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. The trained agents optimize portfolio assembly. A comparative analysis against standard financial models and AI frameworks, using metrics like returns, the Sharpe ratio, and nine evaluation indices, reveals our model's superiority. It notably achieves the highest yield and Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in comparable return scenarios.

Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management

TL;DR

The paper presents a hybrid framework, Knowledge Distillation DDPG (KDD), which marries Markowitz mean-variance portfolio theory with reinforcement learning through a two-stage training process: supervised pretraining to imitate Markowitz allocations and subsequent reinforcement learning with DDPG in a market environment. By transferring knowledge from a teacher model to a student DDPG agent, KDD aims to achieve high returns with controlled risk and improved sample efficiency. Empirical results on Dow Jones 30 data (2009–2018) show KDD delivering superior total and annualized returns, a top risk-adjusted Sharpe ratio, and favorable risk metrics (e.g., Calmar, Alpha) relative to diverse baselines, including the DJI, Markowitz, and standard DDPG. The work highlights a promising direction for AI-driven portfolio management, balancing theoretical soundness with empirical performance, while noting limitations related to market regime shifts and backtesting-to-live deployment gaps.

Abstract

Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz's portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. The trained agents optimize portfolio assembly. A comparative analysis against standard financial models and AI frameworks, using metrics like returns, the Sharpe ratio, and nine evaluation indices, reveals our model's superiority. It notably achieves the highest yield and Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in comparable return scenarios.
Paper Structure (12 sections, 15 equations, 2 figures, 1 table)

This paper contains 12 sections, 15 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Portfolio Value Comparison Over Time
  • Figure 2: Risk vs Return of Investment Strategies