Table of Contents
Fetching ...

Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information

Jinghai He, Cheng Hua, Chunyang Zhou, Zeyu Zheng

TL;DR

This work tackles portfolio allocation in high-dimensional, non-stationary financial markets by introducing Dynamic Embedding Reinforcement Learning (DERL), an end-to-end framework that merges reinforcement learning with dynamic market embeddings learned via generative autoencoders and online meta-learning. The approach jointly learns a low-dimensional state representation and a trading policy, enabling daily allocation decisions with transaction-cost awareness and volatility timing. Empirical results on 30 years of top-500 US stocks show DERL outperforms predict-then-optimize baselines and standard benchmarks, with alpha persistently significant beyond traditional factor models, particularly under market stress. The framework offers a scalable, data-driven way to adapt to shifting market dynamics while improving risk management through embedding denoising, state clustering, and volatility-aware exposure control.

Abstract

We develop a portfolio allocation framework that leverages deep learning techniques to address challenges arising from high-dimensional, non-stationary, and low-signal-to-noise market information. Our approach includes a dynamic embedding method that reduces the non-stationary, high-dimensional state space into a lower-dimensional representation. We design a reinforcement learning (RL) framework that integrates generative autoencoders and online meta-learning to dynamically embed market information, enabling the RL agent to focus on the most impactful parts of the state space for portfolio allocation decisions. Empirical analysis based on the top 500 U.S. stocks demonstrates that our framework outperforms common portfolio benchmarks and the predict-then-optimize (PTO) approach using machine learning, particularly during periods of market stress. Traditional factor models do not fully explain this superior performance. The framework's ability to time volatility reduces its market exposure during turbulent times. Ablation studies confirm the robustness of this performance across various reinforcement learning algorithms. Additionally, the embedding and meta-learning techniques effectively manage the complexities of high-dimensional, noisy, and non-stationary financial data, enhancing both portfolio performance and risk management.

Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information

TL;DR

This work tackles portfolio allocation in high-dimensional, non-stationary financial markets by introducing Dynamic Embedding Reinforcement Learning (DERL), an end-to-end framework that merges reinforcement learning with dynamic market embeddings learned via generative autoencoders and online meta-learning. The approach jointly learns a low-dimensional state representation and a trading policy, enabling daily allocation decisions with transaction-cost awareness and volatility timing. Empirical results on 30 years of top-500 US stocks show DERL outperforms predict-then-optimize baselines and standard benchmarks, with alpha persistently significant beyond traditional factor models, particularly under market stress. The framework offers a scalable, data-driven way to adapt to shifting market dynamics while improving risk management through embedding denoising, state clustering, and volatility-aware exposure control.

Abstract

We develop a portfolio allocation framework that leverages deep learning techniques to address challenges arising from high-dimensional, non-stationary, and low-signal-to-noise market information. Our approach includes a dynamic embedding method that reduces the non-stationary, high-dimensional state space into a lower-dimensional representation. We design a reinforcement learning (RL) framework that integrates generative autoencoders and online meta-learning to dynamically embed market information, enabling the RL agent to focus on the most impactful parts of the state space for portfolio allocation decisions. Empirical analysis based on the top 500 U.S. stocks demonstrates that our framework outperforms common portfolio benchmarks and the predict-then-optimize (PTO) approach using machine learning, particularly during periods of market stress. Traditional factor models do not fully explain this superior performance. The framework's ability to time volatility reduces its market exposure during turbulent times. Ablation studies confirm the robustness of this performance across various reinforcement learning algorithms. Additionally, the embedding and meta-learning techniques effectively manage the complexities of high-dimensional, noisy, and non-stationary financial data, enhancing both portfolio performance and risk management.

Paper Structure

This paper contains 32 sections, 6 theorems, 58 equations, 4 figures, 7 tables, 3 algorithms.

Key Result

Proposition EC.1

Let $\Pi$ denote the set of all non-stationary and randomized policies, and assume an infinite time horizon. Define: There exists a stationary policy $\pi^\star$ such that for all $s \in \mathcal{S}$ and $a \in \mathcal{A}$, The optimal value function satisfies

Figures (4)

  • Figure 1: State Embedding with Generative Autoencoders.
  • Figure 2: Diagram of the FOML Framework for Dynamic Embedding Updates
  • Figure 3: The DERL Framework.
  • Figure 4: Rolling-window backtesting.

Theorems & Definitions (6)

  • Proposition EC.1: Bellman Optimality
  • Lemma EC.1
  • Lemma EC.2
  • Proposition EC.2
  • Proposition EC.3
  • Corollary EC.1