Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management
Marc Velay, Bich-Liên Doan, Arpad Rimmel, Fabrice Popineau, Fabrice Daniel
TL;DR
This paper tackles the robustness and generalization of Deep Reinforcement Learning methods for Online Portfolio Selection (OLPS). It formalizes OLPS as a DRL environment with actions $a_t \in \mathbb{R}^{N+1}$ satisfying $\sum_i a_i = 1$, incorporating cash, and evaluates four reward schemes across multiple market representations. A standardized training/evaluation pipeline is implemented using public data and open-source DRL implementations, testing four algorithms (DDPG, PPO, SAC, A2C) and assessing robustness via CVaR, Information Ratio, IR Trend, and Maximum Drawdown during backtesting on 2021–2022 after training on 2010–2022. The findings indicate that although returns can be competitive, many agents overfit and fail to generalize to out-of-distribution market conditions, underscoring the need for more robust training strategies and reproducible benchmarking in OLPS.
Abstract
Deep Reinforcement Learning approaches to Online Portfolio Selection have grown in popularity in recent years. The sensitive nature of training Reinforcement Learning agents implies a need for extensive efforts in market representation, behavior objectives, and training processes, which have often been lacking in previous works. We propose a training and evaluation process to assess the performance of classical DRL algorithms for portfolio management. We found that most Deep Reinforcement Learning algorithms were not robust, with strategies generalizing poorly and degrading quickly during backtesting.
