CNN-DRL for Scalable Actions in Finance
Sina Montazeri, Akram Mirzaeinia, Haseebullah Jumakhan, Amir Mirzaeinia
TL;DR
Problem: DRL with MLPs struggles to learn when action scale expands in stock trading. Approach: implement a CNN-based DRL agent that uses a $90$-day matrix input and trains with PPO and A2C in FinRL. Findings: CNN achieves stable learning and higher cumulative rewards and Sharpe ratios when action size reaches $1000$ shares, unlike the MLP baseline. Impact: demonstrates that CNN architectures can enable scalable, data-driven trading strategies under large continuous action spaces.
Abstract
The published MLP-based DRL in finance has difficulties in learning the dynamics of the environment when the action scale increases. If the buying and selling increase to one thousand shares, the MLP agent will not be able to effectively adapt to the environment. To address this, we designed a CNN agent that concatenates the data from the last ninety days of the daily feature vector to create the CNN input matrix. Our extensive experiments demonstrate that the MLP-based agent experiences a loss corresponding to the initial environment setup, while our designed CNN remains stable, effectively learns the environment, and leads to an increase in rewards.
