Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information
Jinghai He, Cheng Hua, Chunyang Zhou, Zeyu Zheng
TL;DR
This work tackles portfolio allocation in high-dimensional, non-stationary financial markets by introducing Dynamic Embedding Reinforcement Learning (DERL), an end-to-end framework that merges reinforcement learning with dynamic market embeddings learned via generative autoencoders and online meta-learning. The approach jointly learns a low-dimensional state representation and a trading policy, enabling daily allocation decisions with transaction-cost awareness and volatility timing. Empirical results on 30 years of top-500 US stocks show DERL outperforms predict-then-optimize baselines and standard benchmarks, with alpha persistently significant beyond traditional factor models, particularly under market stress. The framework offers a scalable, data-driven way to adapt to shifting market dynamics while improving risk management through embedding denoising, state clustering, and volatility-aware exposure control.
Abstract
We develop a portfolio allocation framework that leverages deep learning techniques to address challenges arising from high-dimensional, non-stationary, and low-signal-to-noise market information. Our approach includes a dynamic embedding method that reduces the non-stationary, high-dimensional state space into a lower-dimensional representation. We design a reinforcement learning (RL) framework that integrates generative autoencoders and online meta-learning to dynamically embed market information, enabling the RL agent to focus on the most impactful parts of the state space for portfolio allocation decisions. Empirical analysis based on the top 500 U.S. stocks demonstrates that our framework outperforms common portfolio benchmarks and the predict-then-optimize (PTO) approach using machine learning, particularly during periods of market stress. Traditional factor models do not fully explain this superior performance. The framework's ability to time volatility reduces its market exposure during turbulent times. Ablation studies confirm the robustness of this performance across various reinforcement learning algorithms. Additionally, the embedding and meta-learning techniques effectively manage the complexities of high-dimensional, noisy, and non-stationary financial data, enhancing both portfolio performance and risk management.
