Many learning agents interacting with an agent-based market model

Matthew Dicks; Andrew Paskaramoorthy; Tim Gebbie

Many learning agents interacting with an agent-based market model

Matthew Dicks, Andrew Paskaramoorthy, Tim Gebbie

TL;DR

The paper investigates how multiple reinforcement-learning (RL) agents executing optimal orders interact within a reactive agent-based market model (ABM) that features three trophic levels: learning agents, liquidity takers, and liquidity providers. It extends a prior MARL-ABM framework to study how agent number, order sizes, and learning state spaces shape market microstructure, using phase-space analysis and complexity metrics based on Grassberger-Procaccia reconstruction. Key findings show that including learning-based execution agents alters stylised facts toward empirical observations and is necessary for realistic ABMs, but these agents alone do not recover the full complexity observed in real markets, particularly when modeling a single stock. Learning reduces order-flow persistence and, in some setups, absolute-return memory, while using limit orders lowers price impact; however, increasing agent numbers and learning can introduce non-stationary dynamics that challenge learning convergence. The work highlights that intra-order-book network effects and cross-market interactions may be essential to capture the missing complexity, suggesting directions toward multi-asset ABMs for future market-ecology research.

Abstract

We consider the dynamics and the interactions of multiple reinforcement learning optimal execution trading agents interacting with a reactive Agent-Based Model (ABM) of a financial market in event time. The model represents a market ecology with 3-trophic levels represented by: optimal execution learning agents, minimally intelligent liquidity takers, and fast electronic liquidity providers. The optimal execution agent classes include buying and selling agents that can either use a combination of limit orders and market orders, or only trade using market orders. The reward function explicitly balances trade execution slippage against the penalty of not executing the order timeously. This work demonstrates how multiple competing learning agents impact a minimally intelligent market simulation as functions of the number of agents, the size of agents' initial orders, and the state spaces used for learning. We use phase space plots to examine the dynamics of the ABM, when various specifications of learning agents are included. Further, we examine whether the inclusion of optimal execution agents that can learn is able to produce dynamics with the same complexity as empirical data. We find that the inclusion of optimal execution agents changes the stylised facts produced by ABM to conform more with empirical data, and are a necessary inclusion for ABMs investigating market micro-structure. However, including execution agents to chartist-fundamentalist-noise ABMs is insufficient to recover the complexity observed in empirical data.

Many learning agents interacting with an agent-based market model

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 8 figures, 4 tables)

This paper contains 17 sections, 1 equation, 8 figures, 4 tables.

Introduction
Background and Motivation
Reinforcement Learning for Optimal Execution
Learning in Complex Multi-player Games
Market Ecology
Limitations
Agent Specification and Learning
Actions
Rewards
Optimal Policies and Convergence
Exploratory Data Analysis
Comparing Stylised Facts
Persistence of Orderflow
Price Impact
Memory in Absolute Returns
...and 2 more sections

Figures (8)

Figure 1: The agent returns are given as a function of the training episodes. This demonstrates how the different agent's rewards converge under training. In Figure \ref{['fig:a:return-convergence']}, the return rewards for type I agent, both buyers (+) and sellers (-) are shown. Type I agents only trade using market orders.\ref{['fig:b:return-convergence']} has the same plot, but for agents of type II --- agents that use both market orders and limit orders.
Figure 2: The final $\epsilon$-greedy policy is shown for a type I agent example as a heat map. Here for Case 3 (see Table \ref{['tab:RLagentscombinations']}) with the buying agent in Figure \ref{['fig:case3-policy-plot-A']} and the selling agent in \ref{['fig:case3-policy-plot-B']}.
Figure 3: The final greedy policy example for a type II agent as shown for Case 4 (see Table \ref{['tab:RLagentscombinations']}) as a heat map.
Figure 4: Price impact plots for the buyer (left) and seller (right) initiated trades for different cases of learning agents. Type I (blue) agents use only market orders to execute parent orders, whilst Type II (red) agents use market and limit orders. The lower price impact of Type II agents suggests that limit orders can be used to take advantage of opportunities created by market flow and changes in the spread.
Figure 5: Auto-Correlation Function (ACF) plots of the absolute value of the mid-price returns comparing Type I (blue) and Type II (red) cases. Type I agents have non-trivial auto-correlations, while Type II agents suppress autocorrelations. This is indicative that the more complex Type II agents reduce regularity in potentially both the order flow and the liquidity provision processes.
...and 3 more figures

Many learning agents interacting with an agent-based market model

TL;DR

Abstract

Many learning agents interacting with an agent-based market model

Authors

TL;DR

Abstract

Table of Contents

Figures (8)