Risk-Aware Deep Reinforcement Learning for Dynamic Portfolio Optimization
Emmanuel Lwele, Sabuni Emmanuel, Sitali Gabriel Sitali
TL;DR
This work tackles dynamic portfolio optimization under market uncertainty by integrating a Sharpe ratio–based reward with explicit risk controls within a PPO-based DRL framework. The approach maps market states to asset allocations via a neural policy while enforcing long-only constraints and transaction-cost-aware rewards, aiming to maximize risk-adjusted performance. Empirical results reveal a strong risk reduction after training but a substantial drop in absolute and risk-adjusted returns, highlighting challenges in reward shaping, exploration–exploitation balance, and non-stationarity. The study underscores the need for hybrid risk-aware strategies and robust validation to translate promising pre-training results into practical, stable deployment for DRL-driven portfolio management.
Abstract
This paper presents a deep reinforcement learning (DRL) framework for dynamic portfolio optimization under market uncertainty and risk. The proposed model integrates a Sharpe ratio-based reward function with direct risk control mechanisms, including maximum drawdown and volatility constraints. Proximal Policy Optimization (PPO) is employed to learn adaptive asset allocation strategies over historical financial time series. Model performance is benchmarked against mean-variance and equal-weight portfolio strategies using backtesting on high-performing equities. Results indicate that the DRL agent stabilizes volatility successfully but suffers from degraded risk-adjusted returns due to over-conservative policy convergence, highlighting the challenge of balancing exploration, return maximization, and risk mitigation. The study underscores the need for improved reward shaping and hybrid risk-aware strategies to enhance the practical deployment of DRL-based portfolio allocation models.
