Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning
Fabrizio Lillo, Andrea Macrì
TL;DR
This study examines a scenario in which two autonomous agents learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework and shows that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game.
Abstract
The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modelled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework. We show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit supra-competitive solution, {which might be compatible with a tacit collusive behaviour}, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases.
