Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion
Ziyi Wang, Carmine Ventre, Maria Polukarov
TL;DR
The paper tackles algorithmic collusion in markets by introducing a hierarchical multi-agent reinforcement learning framework with a top-layer adversary, a mid-layer self-interested market maker A, and bottom-layer competitors B1, B2, and B*. It defines interpretable interaction metrics and evaluates a three-stage training pipeline using PPO to study competitive and adaptive behaviors under environment perturbations modeled by $P_{t+1}=P_t+\mu\Delta t+\sigma\epsilon_t$ and $p_{\text{fill}}(d)=\exp(-\alpha d)$. Key findings show that a zero-sum suppressor (B2) can dominate order flow and tighten spreads but degrades overall market balance, while the incentive-adaptive hybrid B* can maintain market share and robust interaction patterns with more sustainable coexistence. The work provides a structured lens for evaluating behavioral design in algorithmic trading and suggests directions for meta-learning-based incentive adaptation to enhance generalization across market regimes.
Abstract
Algorithmic collusion has emerged as a central question in AI: Will the interaction between different AI agents deployed in markets lead to collusion? More generally, understanding how emergent behavior, be it a cartel or market dominance from more advanced bots, affects the market overall is an important research question. We propose a hierarchical multi-agent reinforcement learning framework to study algorithmic collusion in market making. The framework includes a self-interested market maker (Agent~A), which is trained in an uncertain environment shaped by an adversary, and three bottom-layer competitors: the self-interested Agent~B1 (whose objective is to maximize its own PnL), the competitive Agent~B2 (whose objective is to minimize the PnL of its opponent), and the hybrid Agent~B$^\star$, which can modulate between the behavior of the other two. To analyze how these agents shape the behavior of each other and affect market outcomes, we propose interaction-level metrics that quantify behavioral asymmetry and system-level dynamics, while providing signals potentially indicative of emergent interaction patterns. Experimental results show that Agent~B2 secures dominant performance in a zero-sum setting against B1, aggressively capturing order flow while tightening average spreads, thus improving market execution efficiency. In contrast, Agent~B$^\star$ exhibits a self-interested inclination when co-existing with other profit-seeking agents, securing dominant market share through adaptive quoting, yet exerting a milder adverse impact on the rewards of Agents~A and B1 compared to B2. These findings suggest that adaptive incentive control supports more sustainable strategic co-existence in heterogeneous agent environments and offers a structured lens for evaluating behavioral design in algorithmic trading systems.
