Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning

Sina Montazeri; Haseebullah Jumakhan; Sonia Abrasiabian; Amir Mirzaeinia

Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning

Sina Montazeri, Haseebullah Jumakhan, Sonia Abrasiabian, Amir Mirzaeinia

TL;DR

The paper tackles the non-stationarity and scale-disparities inherent in financial deep reinforcement learning by introducing two architectural enhancements to CNN-based DRL: column-wise input normalization and a gradient reduction CNN with wider early layers and narrower later layers. Implemented within a PPO framework in the FinRL-Meta environment, the approach leverages the MDP objective $\mathbb{E}\left[\sum_{t=0}^T \gamma^t R(s_t,a_t,s_{t+1})\right]$ and a reward function $R(s_t,a_t,s_{t+1}) = v_{t+1}-v_t$ to guide policy optimization. Experimental results show that the gradient-reduction CNN outperforms a baseline MLP and the original CNN, delivering higher cumulative rewards (e.g., $47$, $120$, and $181$ respectively) and greater stability across market regimes (2015–2023, including volatility events). The findings suggest that deeper, more nuanced network architectures, when paired with robust DRL optimization like PPO, can meaningfully improve financial decision-making and generalization in diverse market conditions.

Abstract

Building on our prior explorations of convolutional neural networks (CNNs) for financial data processing, this paper introduces two significant enhancements to refine our CNN model's predictive performance and robustness for financial tabular data. Firstly, we integrate a normalization layer at the input stage to ensure consistent feature scaling, addressing the issue of disparate feature magnitudes that can skew the learning process. This modification is hypothesized to aid in stabilizing the training dynamics and improving the model's generalization across diverse financial datasets. Secondly, we employ a Gradient Reduction Architecture, where earlier layers are wider and subsequent layers are progressively narrower. This enhancement is designed to enable the model to capture more complex and subtle patterns within the data, a crucial factor in accurately predicting financial outcomes. These advancements directly respond to the limitations identified in previous studies, where simpler models struggled with the complexity and variability inherent in financial applications. Initial tests confirm that these changes improve accuracy and model stability, suggesting that deeper and more nuanced network architectures can significantly benefit financial predictive tasks. This paper details the implementation of these enhancements and evaluates their impact on the model's performance in a controlled experimental setting.

Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning

TL;DR

and a reward function

to guide policy optimization. Experimental results show that the gradient-reduction CNN outperforms a baseline MLP and the original CNN, delivering higher cumulative rewards (e.g.,

, and

respectively) and greater stability across market regimes (2015–2023, including volatility events). The findings suggest that deeper, more nuanced network architectures, when paired with robust DRL optimization like PPO, can meaningfully improve financial decision-making and generalization in diverse market conditions.

Abstract

Paper Structure (18 sections, 6 equations, 4 figures, 2 tables)

This paper contains 18 sections, 6 equations, 4 figures, 2 tables.

Introduction
Problem Description
Market Environment
State Space
Action Space
Reward Function
Hypothesis
Column-wise Normalization of Input Signals
Gradient Reduction Architecture
Methodology
Proximal Policy Optimization (PPO)
Implementation in Stable-Baselines3
Experimental Validation
Network Architecture
Model Training
...and 3 more sections

Figures (4)

Figure 1: The Sliding Window to Create the Input Matrix for Our CNN
Figure 2: Original CNN Architecture
Figure 3: New Proposed Architecture
Figure 4: Commutative Rewards Comparison

Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning

TL;DR

Abstract

Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)