Reinforcement Learning for Stock Transactions
Ziyi Zhou, Nicholas Stern, Julien Laasri
TL;DR
The paper investigates applying reinforcement learning to stock trading by formulating a time-window Markov Decision Process with actions {buy, wait} to determine optimal buy timing. It implements four agents—Baseline, Exact Q-Learning, Linear Approximate Q-Learning, and Deep Q-Learning—and augments the study with price-prediction experiments, evaluating on four tech stocks post-2005 with an 80/10/10 data split. Results show no consistent policy convergence and highly variable profits, though approximate Q-learning (especially linear) and, to a lesser extent, deep Q-learning sometimes outperform the baseline in certain stocks, indicating potential when richer state representations or more data are used. The authors discuss limitations and propose future work including expanded state features (momentum, volume, indices), external data (news), and time-series models like LSTMs to improve robustness and convergence in real-world trading scenarios.
Abstract
Much research has been done to analyze the stock market. After all, if one can determine a pattern in the chaotic frenzy of transactions, then they could make a hefty profit from capitalizing on these insights. As such, the goal of our project was to apply reinforcement learning (RL) to determine the best time to buy a stock within a given time frame. With only a few adjustments, our model can be extended to identify the best time to sell a stock as well. In order to use the format of free, real-world data to train the model, we define our own Markov Decision Process (MDP) problem. These two papers [5] [6] helped us in formulating the state space and the reward system of our MDP problem. We train a series of agents using Q-Learning, Q-Learning with linear function approximation, and deep Q-Learning. In addition, we try to predict the stock prices using machine learning regression and classification models. We then compare our agents to see if they converge on a policy, and if so, which one learned the best policy to maximize profit on the stock market.
