Many learning agents interacting with an agent-based market model
Matthew Dicks, Andrew Paskaramoorthy, Tim Gebbie
TL;DR
The paper investigates how multiple reinforcement-learning (RL) agents executing optimal orders interact within a reactive agent-based market model (ABM) that features three trophic levels: learning agents, liquidity takers, and liquidity providers. It extends a prior MARL-ABM framework to study how agent number, order sizes, and learning state spaces shape market microstructure, using phase-space analysis and complexity metrics based on Grassberger-Procaccia reconstruction. Key findings show that including learning-based execution agents alters stylised facts toward empirical observations and is necessary for realistic ABMs, but these agents alone do not recover the full complexity observed in real markets, particularly when modeling a single stock. Learning reduces order-flow persistence and, in some setups, absolute-return memory, while using limit orders lowers price impact; however, increasing agent numbers and learning can introduce non-stationary dynamics that challenge learning convergence. The work highlights that intra-order-book network effects and cross-market interactions may be essential to capture the missing complexity, suggesting directions toward multi-asset ABMs for future market-ecology research.
Abstract
We consider the dynamics and the interactions of multiple reinforcement learning optimal execution trading agents interacting with a reactive Agent-Based Model (ABM) of a financial market in event time. The model represents a market ecology with 3-trophic levels represented by: optimal execution learning agents, minimally intelligent liquidity takers, and fast electronic liquidity providers. The optimal execution agent classes include buying and selling agents that can either use a combination of limit orders and market orders, or only trade using market orders. The reward function explicitly balances trade execution slippage against the penalty of not executing the order timeously. This work demonstrates how multiple competing learning agents impact a minimally intelligent market simulation as functions of the number of agents, the size of agents' initial orders, and the state spaces used for learning. We use phase space plots to examine the dynamics of the ABM, when various specifications of learning agents are included. Further, we examine whether the inclusion of optimal execution agents that can learn is able to produce dynamics with the same complexity as empirical data. We find that the inclusion of optimal execution agents changes the stylised facts produced by ABM to conform more with empirical data, and are a necessary inclusion for ABMs investigating market micro-structure. However, including execution agents to chartist-fundamentalist-noise ABMs is insufficient to recover the complexity observed in empirical data.
