CNN-DRL for Scalable Actions in Finance

Sina Montazeri; Akram Mirzaeinia; Haseebullah Jumakhan; Amir Mirzaeinia

CNN-DRL for Scalable Actions in Finance

Sina Montazeri, Akram Mirzaeinia, Haseebullah Jumakhan, Amir Mirzaeinia

TL;DR

Problem: DRL with MLPs struggles to learn when action scale expands in stock trading. Approach: implement a CNN-based DRL agent that uses a $90$-day matrix input and trains with PPO and A2C in FinRL. Findings: CNN achieves stable learning and higher cumulative rewards and Sharpe ratios when action size reaches $1000$ shares, unlike the MLP baseline. Impact: demonstrates that CNN architectures can enable scalable, data-driven trading strategies under large continuous action spaces.

Abstract

The published MLP-based DRL in finance has difficulties in learning the dynamics of the environment when the action scale increases. If the buying and selling increase to one thousand shares, the MLP agent will not be able to effectively adapt to the environment. To address this, we designed a CNN agent that concatenates the data from the last ninety days of the daily feature vector to create the CNN input matrix. Our extensive experiments demonstrate that the MLP-based agent experiences a loss corresponding to the initial environment setup, while our designed CNN remains stable, effectively learns the environment, and leads to an increase in rewards.

CNN-DRL for Scalable Actions in Finance

TL;DR

Problem: DRL with MLPs struggles to learn when action scale expands in stock trading. Approach: implement a CNN-based DRL agent that uses a

-day matrix input and trains with PPO and A2C in FinRL. Findings: CNN achieves stable learning and higher cumulative rewards and Sharpe ratios when action size reaches

shares, unlike the MLP baseline. Impact: demonstrates that CNN architectures can enable scalable, data-driven trading strategies under large continuous action spaces.

Abstract

Paper Structure (9 sections, 1 equation, 5 figures, 2 tables)

This paper contains 9 sections, 1 equation, 5 figures, 2 tables.

INTRODUCTION
RELATED WORKS
PROBLEM DESCRIPTION
MDP Model for Stock Trading
Environment
Feature vector
CNN ARCHITECTURE
Evaluation
Conclusion

Figures (5)

Figure 1: Sliding window to create CNN input matrix channel.
Figure 2: Our CNN architecture.
Figure 3: Our CNN vs MLP reward graph
Figure 4: Cost of trading
Figure 5: Sharpe

CNN-DRL for Scalable Actions in Finance

TL;DR

Abstract

CNN-DRL for Scalable Actions in Finance

Authors

TL;DR

Abstract

Table of Contents

Figures (5)