A Deep Reinforcement Learning Framework For Financial Portfolio Management
Jinyang Li
TL;DR
The paper tackles continuous multi-asset portfolio optimization with a model-free deep reinforcement learning framework. It introduces the Ensemble of Identical Independent Evaluators (EIIE) architecture, augmented by a Portfolio-Vector Memory (PVM) and an Online Stochastic Batch Learning (OSBL) scheme, and evaluates CNN, RNN, and LSTM policy networks under realistic transaction-cost constraints. Across cryptocurrency data, the approach yields superior returns and robust risk metrics, while stock-market application shows results near equal-weight baselines, highlighting market-specific effectiveness. The work advances online, data-efficient reinforcement learning for portfolio management and demonstrates practical deployment considerations such as online updating and cost-aware trading.
Abstract
In this research paper, we investigate into a paper named "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem" [arXiv:1706.10059]. It is a portfolio management problem which is solved by deep learning techniques. The original paper proposes a financial-model-free reinforcement learning framework, which consists of the Ensemble of Identical Independent Evaluators (EIIE) topology, a Portfolio-Vector Memory (PVM), an Online Stochastic Batch Learning (OSBL) scheme, and a fully exploiting and explicit reward function. Three different instants are used to realize this framework, namely a Convolutional Neural Network (CNN), a basic Recurrent Neural Network (RNN), and a Long Short-Term Memory (LSTM). The performance is then examined by comparing to a number of recently reviewed or published portfolio-selection strategies. We have successfully replicated their implementations and evaluations. Besides, we further apply this framework in the stock market, instead of the cryptocurrency market that the original paper uses. The experiment in the cryptocurrency market is consistent with the original paper, which achieve superior returns. But it doesn't perform as well when applied in the stock market.
