Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning

Fabrizio Lillo; Andrea Macrì

Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning

Fabrizio Lillo, Andrea Macrì

TL;DR

This study examines a scenario in which two autonomous agents learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework and shows that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game.

Abstract

The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modelled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework. We show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit supra-competitive solution, {which might be compatible with a tacit collusive behaviour}, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases.

Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning

TL;DR

Abstract

Paper Structure (23 sections, 3 theorems, 38 equations, 14 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 3 theorems, 38 equations, 14 figures, 1 table, 1 algorithm.

Introduction
Literature review.
Market impact game setting
The Almgren-Chriss model.
Almgren-Chriss market impact game.
Beyond the Nash equilibrium.
Double Deep Q Learning for multi-agent impact trading
Setting of the numerical experiments
Action selection and reward function
Training scheme
Results
The zero noise case
The moderate noise case
The large noise case
Summary of results and comparison with the Pareto front
...and 8 more sections

Key Result

Proposition 1

For an optimal execution strategy problem where two agents minimise costs as in Eq. eq:mean_var in a market as in Eq. eq:perm, a collusive selling strategy $\Vec{v}^{\,(1,2)}_c$ in the sense of Definition defi:collusion is necessarily a Pareto optimum as defined in Definition defi:pareto. We call th

Figures (14)

Figure 1: Scatter plot of the IS of the two agents for 20 testing runs of $2,500$ iterations in the zero noise case ($\sigma= 10^{-9}$).
Figure 2: Optimal execution strategies for 20 testing runs of $2,500$ iterations, using $\sigma= 10^{-9}$.
Figure 3: Scatter plot of the IS of the two agents in 20 testing runs of $2,500$ iterations in the moderate noise case ($\sigma= 10^{-3}$).
Figure 4: Optimal execution strategies for 20 testing runs of $2,500$ iterations, using $\sigma= 10^{-3}$.
Figure 5: Scatter plot of the IS of the two agents for 20 testing runs of $2,500$ iterations in the large noise case ($\sigma= 10^{-2}$).
...and 9 more figures

Theorems & Definitions (12)

Definition 1: Nash equilibrium schied2017state
Remark 1
Definition 2
Definition 3: Collusion
Definition 4: Pareto Optimum
Proposition 1: Collusive Pareto optima
proof
Definition 5: Pareto-efficient set of solutions
Theorem 1: Pareto-efficient set of solutions
proof
...and 2 more

Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning

TL;DR

Abstract

Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (12)