Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management

Marc Velay; Bich-Liên Doan; Arpad Rimmel; Fabrice Popineau; Fabrice Daniel

Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management

Marc Velay, Bich-Liên Doan, Arpad Rimmel, Fabrice Popineau, Fabrice Daniel

TL;DR

This paper tackles the robustness and generalization of Deep Reinforcement Learning methods for Online Portfolio Selection (OLPS). It formalizes OLPS as a DRL environment with actions $a_t \in \mathbb{R}^{N+1}$ satisfying $\sum_i a_i = 1$, incorporating cash, and evaluates four reward schemes across multiple market representations. A standardized training/evaluation pipeline is implemented using public data and open-source DRL implementations, testing four algorithms (DDPG, PPO, SAC, A2C) and assessing robustness via CVaR, Information Ratio, IR Trend, and Maximum Drawdown during backtesting on 2021–2022 after training on 2010–2022. The findings indicate that although returns can be competitive, many agents overfit and fail to generalize to out-of-distribution market conditions, underscoring the need for more robust training strategies and reproducible benchmarking in OLPS.

Abstract

Deep Reinforcement Learning approaches to Online Portfolio Selection have grown in popularity in recent years. The sensitive nature of training Reinforcement Learning agents implies a need for extensive efforts in market representation, behavior objectives, and training processes, which have often been lacking in previous works. We propose a training and evaluation process to assess the performance of classical DRL algorithms for portfolio management. We found that most Deep Reinforcement Learning algorithms were not robust, with strategies generalizing poorly and degrading quickly during backtesting.

Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management

TL;DR

This paper tackles the robustness and generalization of Deep Reinforcement Learning methods for Online Portfolio Selection (OLPS). It formalizes OLPS as a DRL environment with actions

satisfying

, incorporating cash, and evaluates four reward schemes across multiple market representations. A standardized training/evaluation pipeline is implemented using public data and open-source DRL implementations, testing four algorithms (DDPG, PPO, SAC, A2C) and assessing robustness via CVaR, Information Ratio, IR Trend, and Maximum Drawdown during backtesting on 2021–2022 after training on 2010–2022. The findings indicate that although returns can be competitive, many agents overfit and fail to generalize to out-of-distribution market conditions, underscoring the need for more robust training strategies and reproducible benchmarking in OLPS.

Abstract

Paper Structure (13 sections, 7 equations, 1 figure, 5 tables)

This paper contains 13 sections, 7 equations, 1 figure, 5 tables.

Introduction
Related Works
Learning Algorithms for OLPS
Market Representations
Management Rewards
Limitations
Deep Reinforcement Learning for Online Portfolio Selection
Experiments
Data Processing
Training agents
Backtesting Evaluation
Results
Conclusion

Figures (1)

Figure 1: Rate of Return

Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management

TL;DR

Abstract

Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management

Authors

TL;DR

Abstract

Table of Contents

Figures (1)