Agent Performing Autonomous Stock Trading under Good and Bad Situations

Yunfei Luo; Zhangqi Duan

Agent Performing Autonomous Stock Trading under Good and Bad Situations

Yunfei Luo, Zhangqi Duan

TL;DR

The paper formalizes stock trading as a Markov Decision Process and builds a Python-based simulation to train autonomous agents using three deep RL methods: Deep Q-Learning, Deep SARSA, and Policy Gradient. It evaluates these agents on six technology stocks under two market regimes—before 2021 (good) and after 2021 (bad)—to assess robustness to regime shifts. In the good regime, the agents achieve high annual returns up to approximately $91.4\%$, while in the bad regime returns remain positive but substantially lower (roughly $2.2\%$ to $7.4\%$), reflecting distributional shifts and generalization challenges. The work identifies data distribution changes as a key challenge and proposes future directions such as federated/multitask learning and incorporating crisis-period data to enhance resilience.

Abstract

Stock trading is one of the popular ways for financial management. However, the market and the environment of economy is unstable and usually not predictable. Furthermore, engaging in stock trading requires time and effort to analyze, create strategies, and make decisions. It would be convenient and effective if an agent could assist or even do the task of analyzing and modeling the past data and then generate a strategy for autonomous trading. Recently, reinforcement learning has been shown to be robust in various tasks that involve achieving a goal with a decision making strategy based on time-series data. In this project, we have developed a pipeline that simulates the stock trading environment and have trained an agent to automate the stock trading process with deep reinforcement learning methods, including deep Q-learning, deep SARSA, and the policy gradient method. We evaluate our platform during relatively good (before 2021) and bad (2021 - 2022) situations. The stocks we've evaluated on including Google, Apple, Tesla, Meta, Microsoft, and IBM. These stocks are among the popular ones, and the changes in trends are representative in terms of having good and bad situations. We showed that before 2021, the three reinforcement methods we have tried always provide promising profit returns with total annual rates around $70\%$ to $90\%$, while maintain a positive profit return after 2021 with total annual rates around 2% to 7%.

Agent Performing Autonomous Stock Trading under Good and Bad Situations

TL;DR

, while in the bad regime returns remain positive but substantially lower (roughly

), reflecting distributional shifts and generalization challenges. The work identifies data distribution changes as a key challenge and proposes future directions such as federated/multitask learning and incorporating crisis-period data to enhance resilience.

Abstract

, while maintain a positive profit return after 2021 with total annual rates around 2% to 7%.

Paper Structure (19 sections, 5 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 19 sections, 5 equations, 7 figures, 4 tables, 2 algorithms.

Introduction
Related Work
Methods
Environment
Input Data
Action
State
Reward
Assumption and Constraints
Deep Q-Learning
Deep SARSA
Policy Gradient Method
Results
Evaluation Schema
Before 2021 - Good Situation
...and 4 more sections

Figures (7)

Figure 1: Deep Q Network
Figure 2: Prices of the selected stocks. The split line separate the good and bad situations.
Figure 3: Training loss of Q function
Figure 4: Learning curve of performance in GOOG stock in the test set before 2021.
Figure 5: Learning curve of performance in TSLA stock in the test set before 2021.
...and 2 more figures

Agent Performing Autonomous Stock Trading under Good and Bad Situations

TL;DR

Abstract

Agent Performing Autonomous Stock Trading under Good and Bad Situations

Authors

TL;DR

Abstract

Table of Contents

Figures (7)