Table of Contents
Fetching ...

CNN-DRL for Scalable Actions in Finance

Sina Montazeri, Akram Mirzaeinia, Haseebullah Jumakhan, Amir Mirzaeinia

TL;DR

Problem: DRL with MLPs struggles to learn when action scale expands in stock trading. Approach: implement a CNN-based DRL agent that uses a $90$-day matrix input and trains with PPO and A2C in FinRL. Findings: CNN achieves stable learning and higher cumulative rewards and Sharpe ratios when action size reaches $1000$ shares, unlike the MLP baseline. Impact: demonstrates that CNN architectures can enable scalable, data-driven trading strategies under large continuous action spaces.

Abstract

The published MLP-based DRL in finance has difficulties in learning the dynamics of the environment when the action scale increases. If the buying and selling increase to one thousand shares, the MLP agent will not be able to effectively adapt to the environment. To address this, we designed a CNN agent that concatenates the data from the last ninety days of the daily feature vector to create the CNN input matrix. Our extensive experiments demonstrate that the MLP-based agent experiences a loss corresponding to the initial environment setup, while our designed CNN remains stable, effectively learns the environment, and leads to an increase in rewards.

CNN-DRL for Scalable Actions in Finance

TL;DR

Problem: DRL with MLPs struggles to learn when action scale expands in stock trading. Approach: implement a CNN-based DRL agent that uses a -day matrix input and trains with PPO and A2C in FinRL. Findings: CNN achieves stable learning and higher cumulative rewards and Sharpe ratios when action size reaches shares, unlike the MLP baseline. Impact: demonstrates that CNN architectures can enable scalable, data-driven trading strategies under large continuous action spaces.

Abstract

The published MLP-based DRL in finance has difficulties in learning the dynamics of the environment when the action scale increases. If the buying and selling increase to one thousand shares, the MLP agent will not be able to effectively adapt to the environment. To address this, we designed a CNN agent that concatenates the data from the last ninety days of the daily feature vector to create the CNN input matrix. Our extensive experiments demonstrate that the MLP-based agent experiences a loss corresponding to the initial environment setup, while our designed CNN remains stable, effectively learns the environment, and leads to an increase in rewards.
Paper Structure (9 sections, 1 equation, 5 figures, 2 tables)

This paper contains 9 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Sliding window to create CNN input matrix channel.
  • Figure 2: Our CNN architecture.
  • Figure 3: Our CNN vs MLP reward graph
  • Figure 4: Cost of trading
  • Figure 5: Sharpe