Reinforcement Learning Pair Trading: A Dynamic Scaling approach
Hongshen Yang, Avinash Malik
TL;DR
The paper tackles the challenge of profitable, fast, and adaptive trading in highly volatile cryptocurrency markets by integrating Reinforcement Learning (RL) with pair trading. It introduces an RL-based dynamic scaling framework that enables the agent to decide not only when to trade but also how much capital to allocate (investment quantity), via two agents that handle timing/direction (RL$_1$) and timing/quantity (RL$_2$). Key contributions include an RL environment tailored for quantity-varying pair trading, reward shaping, observation/action spaces, a grid-search protocol for hyperparameters, and empirical results showing substantial profitability gains over traditional pair trading. The findings demonstrate that RL-based pair trading can outperform static rule-based approaches in crypto markets, with significance for designing fast, flexible arbitrage systems in practice.
Abstract
Cryptocurrency is a cryptography-based digital asset with extremely volatile prices. Around USD 70 billion worth of cryptocurrency is traded daily on exchanges. Trading cryptocurrency is difficult due to the inherent volatility of the crypto market. This study investigates whether Reinforcement Learning (RL) can enhance decision-making in cryptocurrency algorithmic trading compared to traditional methods. In order to address this question, we combined reinforcement learning with a statistical arbitrage trading technique, pair trading, which exploits the price difference between statistically correlated assets. We constructed RL environments and trained RL agents to determine when and how to trade pairs of cryptocurrencies. We developed new reward shaping and observation/action spaces for reinforcement learning. We performed experiments with the developed reinforcement learner on pairs of BTC-GBP and BTC-EUR data separated by 1 min intervals (n=263,520). The traditional non-RL pair trading technique achieved an annualized profit of 8.33%, while the proposed RL-based pair trading technique achieved annualized profits from 9.94% to 31.53%, depending upon the RL learner. Our results show that RL can significantly outperform manual and traditional pair trading techniques when applied to volatile markets such as~cryptocurrencies.
