Modelling crypto markets by multi-agent reinforcement learning

Johann Lussange; Stefano Vrizzi; Stefano Palminteri; Boris Gutkin

Modelling crypto markets by multi-agent reinforcement learning

Johann Lussange, Stefano Vrizzi, Stefano Palminteri, Boris Gutkin

TL;DR

This work develops SYMBA, a multi-agent reinforcement learning crypto market simulator calibrated to Binance data for 153 assets between 2018 and 2022. Each agent runs two RL modules (forecasting and trading) and bases asset valuations on both market prices and a cointegration-based fundamental estimate, integrated within a centralized double-auction order book. The study demonstrates that SYMBA reproduces key stylized facts of crypto markets, including non-normal returns, volatility and volume clustering, and decaying price-autocorrelations, while allowing analysis of learning dynamics and agent-level behavior. The model offers a data-driven framework for risk management, policy evaluation, and strategy optimization in rapidly evolving crypto markets, with clear avenues for incorporating intraday dynamics and regulatory changes in future work.

Abstract

Building on a previous foundation work (Lussange et al. 2020), this study introduces a multi-agent reinforcement learning (MARL) model simulating crypto markets, which is calibrated to the Binance's daily closing prices of $153$ cryptocurrencies that were continuously traded between 2018 and 2022. Unlike previous agent-based models (ABM) or multi-agent systems (MAS) which relied on zero-intelligence agents or single autonomous agent methodologies, our approach relies on endowing agents with reinforcement learning (RL) techniques in order to model crypto markets. This integration is designed to emulate, with a bottom-up approach to complexity inference, both individual and collective agents, ensuring robustness in the recent volatile conditions of such markets and during the COVID-19 era. A key feature of our model also lies in the fact that its autonomous agents perform asset price valuation based on two sources of information: the market prices themselves, and the approximation of the crypto assets fundamental values beyond what those market prices are. Our MAS calibration against real market data allows for an accurate emulation of crypto markets microstructure and probing key market behaviors, in both the bearish and bullish regimes of that particular time period.

Modelling crypto markets by multi-agent reinforcement learning

TL;DR

Abstract

cryptocurrencies that were continuously traded between 2018 and 2022. Unlike previous agent-based models (ABM) or multi-agent systems (MAS) which relied on zero-intelligence agents or single autonomous agent methodologies, our approach relies on endowing agents with reinforcement learning (RL) techniques in order to model crypto markets. This integration is designed to emulate, with a bottom-up approach to complexity inference, both individual and collective agents, ensuring robustness in the recent volatile conditions of such markets and during the COVID-19 era. A key feature of our model also lies in the fact that its autonomous agents perform asset price valuation based on two sources of information: the market prices themselves, and the approximation of the crypto assets fundamental values beyond what those market prices are. Our MAS calibration against real market data allows for an accurate emulation of crypto markets microstructure and probing key market behaviors, in both the bearish and bullish regimes of that particular time period.

Paper Structure (28 sections, 21 equations, 14 figures, 1 table)

This paper contains 28 sections, 21 equations, 14 figures, 1 table.

Introduction
General problem
Past research
New trends
Our contribution
Structure
Reinforcement learning
States, actions, rewards
Policy and value functions
Policy-based vs. value-based
Exploration vs. exploitation
Temporal credit assignment
Curse of dimensionality
Recent research trends
Model and data
...and 13 more sections

Figures (14)

Figure 1: Classical algorithmic procedure of a reinforcement learning agent at time step $t$ in the context of SYMBA described in Section \ref{['SectionIII']}. In a given state $s_t$ of its environment (i.e. the market), a given agent $i$ selects one of its actions $a_t$ (from its forecasting or trading algorithm) with respect to the market order book of a given asset $j$, thus yielding an associated given reward $r_{t+1}$ and new state of the environment $s_{t+1}$.
Figure 2: Comprehensive Schematic of the SYMBA crypto Market Simulator and Its Operational Dynamics. This figure presents an integrated view of the SYMBA simulator, emphasizing the dual-level interaction within the simulated financial market. At the core of the system, individual agents (bottom-left) utilize two distinct reinforcement learning algorithms, $\mathcal{F}^{i}$ for forecasting and $\mathcal{T}^{i}$ for trading, to independently formulate and execute trading strategies at each simulation step. These strategies are then aggregated at the market level through a centralized double-auction order book (top-right). The order book directs market dynamics by matching buy and sell orders from different agents, effectively determining market prices and volumes (bottom-right). This figure illustrates the iterative loop of agent decision-making and market adjustment (top-left), which collectively shapes the emergent macroscopic market behavior. By simulating the interplay between individual agent strategies and market-level effects, SYMBA provides insights into how individual behaviors and collective market responses yield in a complex financial ecosystem.
Figure 3: Representation of the trajectories of fundamental values modeled by $\mathcal{T}^{j}(t)$ (depicted as a black line) and their estimated values (denoted as $\mathcal{B}^{i,j}(t)$) by three different agents (displayed as red, blue, and green lines, resp.), over $200$ time steps.
Figure 4: Pseudo-code of SYMBA's iteration procedure.
Figure 5: Comparative Distribution of Logarithmic Price Returns: The red curve represents the Binance training set, while the blue curve represents the Binance testing set.
...and 9 more figures

Modelling crypto markets by multi-agent reinforcement learning

TL;DR

Abstract

Modelling crypto markets by multi-agent reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Figures (14)