Table of Contents
Fetching ...

Attention-Enhanced Reinforcement Learning for Dynamic Portfolio Optimization

Pei Xue, Yuanchun Ye

TL;DR

The paper tackles dynamic portfolio optimization under uncertainty by introducing a reinforcement learning framework that enforces feasibility with a Dirichlet policy on the simplex and captures cross-asset dependencies via attention-based encoders. The reward combines portfolio growth with transaction costs and a covariance-based risk penalty, tying learning to mean–variance trade-offs; evaluation uses purged walk-forward backtests to ensure realism. Empirically, attention-enhanced Dirichlet policies outperform equal-weight and standard RL baselines in terminal wealth and risk-adjusted metrics while maintaining realistic turnover, underscoring improved stability and interpretability of RL for portfolio management. The findings suggest that principled action parameterization plus cross-sectional representations can yield incremental yet robust gains, with future work exploring explicit risk constraints and broader markets.

Abstract

We develop a deep reinforcement learning framework for dynamic portfolio optimization that combines a Dirichlet policy with cross-sectional attention mechanisms. The Dirichlet formulation ensures that portfolio weights are always feasible, handles tradability constraints naturally, and provides a stable way to explore the allocation space. The model integrates per-asset temporal encoders with a global attention layer, allowing it to capture sector relationships, factor spillovers, and other cross asset dependencies. The reward function includes transaction costs and portfolio variance penalties, linking the learning objective to traditional mean variance trade offs. The results show that attention based Dirichlet policies outperform equal-weight and standard reinforcement learning benchmarks in terms of terminal wealth and Sharpe ratio, while maintaining realistic turnover and drawdown levels. Overall, the study shows that combining principled action design with attention-based representations improves both the stability and interpretability of reinforcement learning for portfolio management.

Attention-Enhanced Reinforcement Learning for Dynamic Portfolio Optimization

TL;DR

The paper tackles dynamic portfolio optimization under uncertainty by introducing a reinforcement learning framework that enforces feasibility with a Dirichlet policy on the simplex and captures cross-asset dependencies via attention-based encoders. The reward combines portfolio growth with transaction costs and a covariance-based risk penalty, tying learning to mean–variance trade-offs; evaluation uses purged walk-forward backtests to ensure realism. Empirically, attention-enhanced Dirichlet policies outperform equal-weight and standard RL baselines in terminal wealth and risk-adjusted metrics while maintaining realistic turnover, underscoring improved stability and interpretability of RL for portfolio management. The findings suggest that principled action parameterization plus cross-sectional representations can yield incremental yet robust gains, with future work exploring explicit risk constraints and broader markets.

Abstract

We develop a deep reinforcement learning framework for dynamic portfolio optimization that combines a Dirichlet policy with cross-sectional attention mechanisms. The Dirichlet formulation ensures that portfolio weights are always feasible, handles tradability constraints naturally, and provides a stable way to explore the allocation space. The model integrates per-asset temporal encoders with a global attention layer, allowing it to capture sector relationships, factor spillovers, and other cross asset dependencies. The reward function includes transaction costs and portfolio variance penalties, linking the learning objective to traditional mean variance trade offs. The results show that attention based Dirichlet policies outperform equal-weight and standard reinforcement learning benchmarks in terms of terminal wealth and Sharpe ratio, while maintaining realistic turnover and drawdown levels. Overall, the study shows that combining principled action design with attention-based representations improves both the stability and interpretability of reinforcement learning for portfolio management.

Paper Structure

This paper contains 49 sections, 31 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Backtest cumulative wealth (2020–2025). PPO, A2C, and REINFORCE track aggregate market cycles but sustain higher cumulative returns than buy-and-hold, with faster recovery from drawdowns.