Table of Contents
Fetching ...

Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning

Fabrizio Lillo, Andrea Macrì

TL;DR

This study examines a scenario in which two autonomous agents learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework and shows that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game.

Abstract

The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modelled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework. We show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit supra-competitive solution, {which might be compatible with a tacit collusive behaviour}, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases.

Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning

TL;DR

This study examines a scenario in which two autonomous agents learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework and shows that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game.

Abstract

The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modelled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework. We show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit supra-competitive solution, {which might be compatible with a tacit collusive behaviour}, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases.
Paper Structure (23 sections, 3 theorems, 38 equations, 14 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 3 theorems, 38 equations, 14 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

For an optimal execution strategy problem where two agents minimise costs as in Eq. eq:mean_var in a market as in Eq. eq:perm, a collusive selling strategy $\Vec{v}^{\,(1,2)}_c$ in the sense of Definition defi:collusion is necessarily a Pareto optimum as defined in Definition defi:pareto. We call th

Figures (14)

  • Figure 1: Scatter plot of the IS of the two agents for 20 testing runs of $2,500$ iterations in the zero noise case ($\sigma= 10^{-9}$).
  • Figure 2: Optimal execution strategies for 20 testing runs of $2,500$ iterations, using $\sigma= 10^{-9}$.
  • Figure 3: Scatter plot of the IS of the two agents in 20 testing runs of $2,500$ iterations in the moderate noise case ($\sigma= 10^{-3}$).
  • Figure 4: Optimal execution strategies for 20 testing runs of $2,500$ iterations, using $\sigma= 10^{-3}$.
  • Figure 5: Scatter plot of the IS of the two agents for 20 testing runs of $2,500$ iterations in the large noise case ($\sigma= 10^{-2}$).
  • ...and 9 more figures

Theorems & Definitions (12)

  • Definition 1: Nash equilibrium schied2017state
  • Remark 1
  • Definition 2
  • Definition 3: Collusion
  • Definition 4: Pareto Optimum
  • Proposition 1: Collusive Pareto optima
  • proof
  • Definition 5: Pareto-efficient set of solutions
  • Theorem 1: Pareto-efficient set of solutions
  • proof
  • ...and 2 more