Table of Contents
Fetching ...

Dueling Deep Reinforcement Learning for Financial Time Series

Bruno Giorgio

TL;DR

This research explores the application of Double DQN (DDQN) and Dueling Network Architectures, to financial trading tasks using historical SP500 index data and confirms that RL agents, even when trained on limited datasets, can outperform random strategies by leveraging advanced architectures.

Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for solving decision-making problems in dynamic environments. In this research, we explore the application of Double DQN (DDQN) and Dueling Network Architectures, to financial trading tasks using historical SP500 index data. Our focus is training agents capable of optimizing trading strategies while accounting for practical constraints such as transaction costs. The study evaluates the model performance across scenarios with and without commissions, highlighting the impact of cost-sensitive environments on reward dynamics. Despite computational limitations and the inherent complexity of financial time series data, the agent successfully learned meaningful trading policies. The findings confirm that RL agents, even when trained on limited datasets, can outperform random strategies by leveraging advanced architectures such as DDQN and Dueling Networks. However, significant challenges persist, particularly with a sub-optimal policy due to the complexity of data source.

Dueling Deep Reinforcement Learning for Financial Time Series

TL;DR

This research explores the application of Double DQN (DDQN) and Dueling Network Architectures, to financial trading tasks using historical SP500 index data and confirms that RL agents, even when trained on limited datasets, can outperform random strategies by leveraging advanced architectures.

Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for solving decision-making problems in dynamic environments. In this research, we explore the application of Double DQN (DDQN) and Dueling Network Architectures, to financial trading tasks using historical SP500 index data. Our focus is training agents capable of optimizing trading strategies while accounting for practical constraints such as transaction costs. The study evaluates the model performance across scenarios with and without commissions, highlighting the impact of cost-sensitive environments on reward dynamics. Despite computational limitations and the inherent complexity of financial time series data, the agent successfully learned meaningful trading policies. The findings confirm that RL agents, even when trained on limited datasets, can outperform random strategies by leveraging advanced architectures such as DDQN and Dueling Networks. However, significant challenges persist, particularly with a sub-optimal policy due to the complexity of data source.

Paper Structure

This paper contains 7 sections, 2 equations, 13 figures.

Figures (13)

  • Figure 1: Implementation of Dueling Architecture into the Q-Network/Target Network
  • Figure 2: Reward % (y-axis) over Training Episodes (x-axis). FFDQN model with Batch size 32 bit (black line) and Batch size 128 (light blue line).
  • Figure 3: Reward % (y-axis) over Training Episodes (x-axis). CNN model with Batch size 32 bit (purple line) and Batch size 128 (yellow line)).
  • Figure 4: Logic schema of Training of DDQN architecture.
  • Figure 5: Testing Cumulative Reward over Training Episodes on FFDQN model batch size 32-bit without commission costs.
  • ...and 8 more figures