Table of Contents
Fetching ...

When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making

Ali Raza Jafree, Konark Jain, Nick Firoozye

TL;DR

The paper addresses adverse selection of medium-frequency traders by high-frequency traders within a Hawkes-based limit order book. It introduces an impulse-control RL market-making agent trained with PPO and self-imitation learning to approximate the $HJB-QVI$ solution and evaluates it against a meta-order executed by an MFT using TWAP. The findings show that the RL market maker can capitalize on meta-order-induced price drift, with asymmetric benefits for buy versus sell flows, while increased profitability for the RL agent does not necessarily escalate slippage for the MFT. The work offers a framework for understanding AI-driven market making in endogenous, multi-agent environments and highlights directions for defensive mechanisms and more sophisticated meta-order strategies.

Abstract

We investigate the mechanisms by which medium-frequency trading agents are adversely selected by opportunistic high-frequency traders. We use reinforcement learning (RL) within a Hawkes Limit Order Book (LOB) model in order to replicate the behaviours of high-frequency market makers. In contrast to the classical models with exogenous price impact assumptions, the Hawkes model accounts for endogenous price impact and other key properties of the market (Jain et al. 2024a). Given the real-world impracticalities of the market maker updating strategies for every event in the LOB, we formulate the high-frequency market making agent via an impulse control reinforcement learning framework (Jain et al. 2025). The RL used in the simulation utilises Proximal Policy Optimisation (PPO) and self-imitation learning. To replicate the adverse selection phenomenon, we test the RL agent trading against a medium frequency trader (MFT) executing a meta-order and demonstrate that, with training against the MFT meta-order execution agent, the RL market making agent learns to capitalise on the price drift induced by the meta-order. Recent empirical studies have shown that medium-frequency traders are increasingly subject to adverse selection by high-frequency trading agents. As high-frequency trading continues to proliferate across financial markets, the slippage costs incurred by medium-frequency traders are likely to increase over time. However, we do not observe that increased profits for the market making RL agent necessarily cause significantly increased slippages for the MFT agent.

When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making

TL;DR

The paper addresses adverse selection of medium-frequency traders by high-frequency traders within a Hawkes-based limit order book. It introduces an impulse-control RL market-making agent trained with PPO and self-imitation learning to approximate the solution and evaluates it against a meta-order executed by an MFT using TWAP. The findings show that the RL market maker can capitalize on meta-order-induced price drift, with asymmetric benefits for buy versus sell flows, while increased profitability for the RL agent does not necessarily escalate slippage for the MFT. The work offers a framework for understanding AI-driven market making in endogenous, multi-agent environments and highlights directions for defensive mechanisms and more sophisticated meta-order strategies.

Abstract

We investigate the mechanisms by which medium-frequency trading agents are adversely selected by opportunistic high-frequency traders. We use reinforcement learning (RL) within a Hawkes Limit Order Book (LOB) model in order to replicate the behaviours of high-frequency market makers. In contrast to the classical models with exogenous price impact assumptions, the Hawkes model accounts for endogenous price impact and other key properties of the market (Jain et al. 2024a). Given the real-world impracticalities of the market maker updating strategies for every event in the LOB, we formulate the high-frequency market making agent via an impulse control reinforcement learning framework (Jain et al. 2025). The RL used in the simulation utilises Proximal Policy Optimisation (PPO) and self-imitation learning. To replicate the adverse selection phenomenon, we test the RL agent trading against a medium frequency trader (MFT) executing a meta-order and demonstrate that, with training against the MFT meta-order execution agent, the RL market making agent learns to capitalise on the price drift induced by the meta-order. Recent empirical studies have shown that medium-frequency traders are increasingly subject to adverse selection by high-frequency trading agents. As high-frequency trading continues to proliferate across financial markets, the slippage costs incurred by medium-frequency traders are likely to increase over time. However, we do not observe that increased profits for the market making RL agent necessarily cause significantly increased slippages for the MFT agent.

Paper Structure

This paper contains 21 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: TWAP Agent Flowchart
  • Figure 2: TWAP Agent Quantity vs Price impact During Execution
  • Figure 3: TWAP Price Impact Decay Post Execution
  • Figure 4: fRL Agent Inventory Distribution, 140 episodes
  • Figure 5: fRL Agent Inventory Distribution, 246 episodes