Table of Contents
Fetching ...

Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization

Jun Kevin, Pujianto Yugopuspito

TL;DR

The paper tackles dynamic portfolio optimization in non-stationary markets by marrying LSTM-based return forecasting with Proximal Policy Optimization (PPO) in a sparse, Top-$K$ allocation scheme. The hybrid model leverages predictive foresight to guide adaptive policy updates, evaluated on weekly data from 2018–2024 across four asset classes, and matched against equal-weight, index, and single-model baselines. Empirical results show that the Hybrid LSTM+PPO configuration achieves strong cumulative growth, notably with Top-$5$ yielding an annualized return of $\mu_{\mathrm{ann}}=0.2538$ while balancing drawdown and volatility, though in some setups it trades off Sharpe for higher returns. The study demonstrates the value of combining forecasting priors with reinforcement-learning-based allocation to improve resilience under regime shifts and transaction costs, offering a scalable, modular framework with potential extensions in multi-frequency forecasting and risk-aware objectives.

Abstract

This paper introduces a hybrid framework for portfolio optimization that fuses Long Short-Term Memory (LSTM) forecasting with a Proximal Policy Optimization (PPO) reinforcement learning strategy. The proposed system leverages the predictive power of deep recurrent networks to capture temporal dependencies, while the PPO agent adaptively refines portfolio allocations in continuous action spaces, allowing the system to anticipate trends while adjusting dynamically to market shifts. Using multi-asset datasets covering U.S. and Indonesian equities, U.S. Treasuries, and major cryptocurrencies from January 2018 to December 2024, the model is evaluated against several baselines, including equal-weight, index-style, and single-model variants (LSTM-only and PPO-only). The framework's performance is benchmarked against equal-weighted, index-based, and single-model approaches (LSTM-only and PPO-only) using annualized return, volatility, Sharpe ratio, and maximum drawdown metrics, each adjusted for transaction costs. The results indicate that the hybrid architecture delivers higher returns and stronger resilience under non-stationary market regimes, suggesting its promise as a robust, AI-driven framework for dynamic portfolio optimization.

Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization

TL;DR

The paper tackles dynamic portfolio optimization in non-stationary markets by marrying LSTM-based return forecasting with Proximal Policy Optimization (PPO) in a sparse, Top- allocation scheme. The hybrid model leverages predictive foresight to guide adaptive policy updates, evaluated on weekly data from 2018–2024 across four asset classes, and matched against equal-weight, index, and single-model baselines. Empirical results show that the Hybrid LSTM+PPO configuration achieves strong cumulative growth, notably with Top- yielding an annualized return of while balancing drawdown and volatility, though in some setups it trades off Sharpe for higher returns. The study demonstrates the value of combining forecasting priors with reinforcement-learning-based allocation to improve resilience under regime shifts and transaction costs, offering a scalable, modular framework with potential extensions in multi-frequency forecasting and risk-aware objectives.

Abstract

This paper introduces a hybrid framework for portfolio optimization that fuses Long Short-Term Memory (LSTM) forecasting with a Proximal Policy Optimization (PPO) reinforcement learning strategy. The proposed system leverages the predictive power of deep recurrent networks to capture temporal dependencies, while the PPO agent adaptively refines portfolio allocations in continuous action spaces, allowing the system to anticipate trends while adjusting dynamically to market shifts. Using multi-asset datasets covering U.S. and Indonesian equities, U.S. Treasuries, and major cryptocurrencies from January 2018 to December 2024, the model is evaluated against several baselines, including equal-weight, index-style, and single-model variants (LSTM-only and PPO-only). The framework's performance is benchmarked against equal-weighted, index-based, and single-model approaches (LSTM-only and PPO-only) using annualized return, volatility, Sharpe ratio, and maximum drawdown metrics, each adjusted for transaction costs. The results indicate that the hybrid architecture delivers higher returns and stronger resilience under non-stationary market regimes, suggesting its promise as a robust, AI-driven framework for dynamic portfolio optimization.

Paper Structure

This paper contains 18 sections, 11 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: LSTM cell architecture illustrating the input, forget, and output gates, which regulate information flow between the cell state ($C_t$) and hidden state ($H_t$).
  • Figure 2: Research pipeline integrating data preparation, hybrid modeling (LSTM + PPO), and evaluation.
  • Figure 4: Portfolio compositions of Hybrid LSTM+PPO portfolios across two consecutive evaluation weeks (March 24 and March 31, 2024). From top to bottom: Top-5, Top-10, and Top-30 configurations. Increasing $K$ leads to broader diversification and smoother allocation distributions across sectors and asset classes.
  • Figure 5: Cumulative equity curves of Hybrid LSTM+PPO portfolios compared to single-model baselines and traditional benchmarks on weekly data (2024).
  • Figure : Top-5
  • ...and 3 more figures