Table of Contents
Fetching ...

Deep Reinforcement Learning Strategies in Finance: Insights into Asset Holding, Trading Behavior, and Purchase Diversity

Alireza Mohammadshafie, Akram Mirzaeinia, Haseebullah Jumakhan, Amir Mirzaeinia

TL;DR

This paper investigates how deep reinforcement learning (DRL) agents behave in financial trading, focusing on whether they hold or trade assets and how diversified their purchases are. Using Yahoo Finance hourly data for 30 Dow Jones stocks, a FinRL environment, 100,000 time steps, a 301-dimensional state space, and five DRL algorithms (DDPG, PPO, TD3, SAC, A2C), it benchmarks performance and trading patterns. The findings show that A2C achieves the highest cumulative rewards, while PPO and SAC tend to trade more aggressively on a small subset of stocks, with DDPG and TD3 balancing holding and diversification. These insights inform algorithm selection for finance applications and underscore the need for deeper study of decision-making processes and risk management in live markets.

Abstract

Recent deep reinforcement learning (DRL) methods in finance show promising outcomes. However, there is limited research examining the behavior of these DRL algorithms. This paper aims to investigate their tendencies towards holding or trading financial assets as well as purchase diversity. By analyzing their trading behaviors, we provide insights into the decision-making processes of DRL models in finance applications. Our findings reveal that each DRL algorithm exhibits unique trading patterns and strategies, with A2C emerging as the top performer in terms of cumulative rewards. While PPO and SAC engage in significant trades with a limited number of stocks, DDPG and TD3 adopt a more balanced approach. Furthermore, SAC and PPO tend to hold positions for shorter durations, whereas DDPG, A2C, and TD3 display a propensity to remain stationary for extended periods.

Deep Reinforcement Learning Strategies in Finance: Insights into Asset Holding, Trading Behavior, and Purchase Diversity

TL;DR

This paper investigates how deep reinforcement learning (DRL) agents behave in financial trading, focusing on whether they hold or trade assets and how diversified their purchases are. Using Yahoo Finance hourly data for 30 Dow Jones stocks, a FinRL environment, 100,000 time steps, a 301-dimensional state space, and five DRL algorithms (DDPG, PPO, TD3, SAC, A2C), it benchmarks performance and trading patterns. The findings show that A2C achieves the highest cumulative rewards, while PPO and SAC tend to trade more aggressively on a small subset of stocks, with DDPG and TD3 balancing holding and diversification. These insights inform algorithm selection for finance applications and underscore the need for deeper study of decision-making processes and risk management in live markets.

Abstract

Recent deep reinforcement learning (DRL) methods in finance show promising outcomes. However, there is limited research examining the behavior of these DRL algorithms. This paper aims to investigate their tendencies towards holding or trading financial assets as well as purchase diversity. By analyzing their trading behaviors, we provide insights into the decision-making processes of DRL models in finance applications. Our findings reveal that each DRL algorithm exhibits unique trading patterns and strategies, with A2C emerging as the top performer in terms of cumulative rewards. While PPO and SAC engage in significant trades with a limited number of stocks, DDPG and TD3 adopt a more balanced approach. Furthermore, SAC and PPO tend to hold positions for shorter durations, whereas DDPG, A2C, and TD3 display a propensity to remain stationary for extended periods.
Paper Structure (12 sections, 7 equations, 7 figures, 3 tables)

This paper contains 12 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The temporal progression of accumulated rewards is analyzed throughout test data to reveal the performance dynamics of DDPG, PPO, TD3, SAC, and A2C models in real-world trading scenarios.
  • Figure 2: The integral holding values, which show the overall transaction volumes of certain stocks by each model for the whole testing period.
  • Figure 3: The stock holdings maintained by TD3 during the trading period.
  • Figure 4: The stock holdings maintained by DDPG during the trading period.
  • Figure 5: The stock holdings maintained by A2C during the trading period.
  • ...and 2 more figures