Table of Contents
Fetching ...

MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model

Junjie Li, Yang Liu, Weiqing Liu, Shikai Fang, Lewen Wang, Chang Xu, Jiang Bian

TL;DR

MarS introduces the Large Market Model (LMM), a domain-specific generative foundation model trained on order-level market data to enable high-resolution, interactive, and controllable financial market simulations. By coupling an Order Sequence Model and an Order-Batch Model within an ensemble, MarS can generate realistic order streams conditioned on history, user inputs, and market rules, then simulate market clearing in real time. The system is validated through realism (stylized facts), interactivity (user-driven market impact), and controllability (prompt- and replay-based control), and is demonstrated across forecasting, detection, 'what-if' analysis, and RL training tasks. The work reports scaling laws for LMM, establishes the effectiveness of a simulated clearing house, and presents novel analyses of market impact beyond the square-root law, highlighting MarS’s potential to transform financial analysis and strategy development in a risk-free, data-rich environment.

Abstract

Generative models aim to simulate realistic effects of various actions across different contexts, from text generation to visual effects. Despite significant efforts to build real-world simulators, the application of generative models to virtual worlds, like financial markets, remains under-explored. In financial markets, generative models can simulate complex market effects of participants with various behaviors, enabling interaction under different market conditions, and training strategies without financial risk. This simulation relies on the finest structured data in financial market like orders thus building the finest realistic simulation. We propose Large Market Model (LMM), an order-level generative foundation model, for financial market simulation, akin to language modeling in the digital world. Our financial Market Simulation engine (MarS), powered by LMM, addresses the domain-specific need for realistic, interactive and controllable order generation. Key observations include LMM's strong scalability across data size and model complexity, and MarS's robust and practicable realism in controlled generation with market impact. We showcase MarS as a forecast tool, detection system, analysis platform, and agent training environment, thus demonstrating MarS's "paradigm shift" potential for a variety of financial applications. We release the code of MarS at https://github.com/microsoft/MarS/.

MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model

TL;DR

MarS introduces the Large Market Model (LMM), a domain-specific generative foundation model trained on order-level market data to enable high-resolution, interactive, and controllable financial market simulations. By coupling an Order Sequence Model and an Order-Batch Model within an ensemble, MarS can generate realistic order streams conditioned on history, user inputs, and market rules, then simulate market clearing in real time. The system is validated through realism (stylized facts), interactivity (user-driven market impact), and controllability (prompt- and replay-based control), and is demonstrated across forecasting, detection, 'what-if' analysis, and RL training tasks. The work reports scaling laws for LMM, establishes the effectiveness of a simulated clearing house, and presents novel analyses of market impact beyond the square-root law, highlighting MarS’s potential to transform financial analysis and strategy development in a risk-free, data-rich environment.

Abstract

Generative models aim to simulate realistic effects of various actions across different contexts, from text generation to visual effects. Despite significant efforts to build real-world simulators, the application of generative models to virtual worlds, like financial markets, remains under-explored. In financial markets, generative models can simulate complex market effects of participants with various behaviors, enabling interaction under different market conditions, and training strategies without financial risk. This simulation relies on the finest structured data in financial market like orders thus building the finest realistic simulation. We propose Large Market Model (LMM), an order-level generative foundation model, for financial market simulation, akin to language modeling in the digital world. Our financial Market Simulation engine (MarS), powered by LMM, addresses the domain-specific need for realistic, interactive and controllable order generation. Key observations include LMM's strong scalability across data size and model complexity, and MarS's robust and practicable realism in controlled generation with market impact. We showcase MarS as a forecast tool, detection system, analysis platform, and agent training environment, thus demonstrating MarS's "paradigm shift" potential for a variety of financial applications. We release the code of MarS at https://github.com/microsoft/MarS/.
Paper Structure (46 sections, 8 equations, 26 figures, 8 tables, 1 algorithm)

This paper contains 46 sections, 8 equations, 26 figures, 8 tables, 1 algorithm.

Figures (26)

  • Figure 1: High-Level Overview of MarS. MarS is powered by a generative foundation model (LMM) trained on order-level historical financial market data. During real-time simulation, LMM dynamically generates order series in response to various conditions, including user-injected interactive orders, vague target scenario descriptions, and current/recent market data. These generated order series, combined with user interactive orders, are matched in a simulated clearing house in real-time, producing fine-grained simulated market trajectories. The flexibility of LMM's order generation enables MarS to support various downstream applications, such as forecasting, detection systems, analysis platforms, and agent training environments.
  • Figure 2: The order image converter transforms order data into a visual representation. Each order has three attributes: type (Bid, Ask, Cancel), price slot (relative to the mid-price), and volume slot (binned volume). The pixel values in the image represent the number of orders with the same attributes, with higher pixel values indicating more orders. More details can be found in \ref{['sec:order_image_converter']}.
  • Figure 3: Scaling curves of Order Model and Order-Batch Model. (a) Order Model: Trained on 32 billion tokens, with model sizes ranging from 2 million to 1.02 billion parameters. (b) Order-Batch Model: Trained on 10 billion tokens, with model sizes ranging from 150 million to 3 billion parameters. The results demonstrate enhanced performance with increased data and model sizes.
  • Figure 4: The process of MarS generation employs a two-level order generation mechanism. At the order-batch level, following the two guiding principles in Sec. \ref{['item:priciples']}, the Order-Batch Model processes existing orders from $\textit{minute}_t$ and generates $N$ possible distributions for $\textit{minute}_{t+1}$. Through a filter process based on control signals, the target distribution ($\star$) is selected and serves as a condition for the Ensemble Model (E). At the order level, the Order Model (O) generates immediate responses for recent and user-submitted orders, while the Ensemble Model refines these generations conditioned on the target distribution. The generated orders in $\textit{minute}_{t+1}$ are fed back to the Order-Batch Model (OB) for $\textit{minute}_{t+2}$ prediction, creating a dynamic feedback loop that balances market impact and controlled generation.
  • Figure 5: Illustration of Stylized Facts in MarS. (a) Aggregational Gaussianity: as the interval increases from 1 to 5 minutes, the distribution of log returns becomes more similar to a normal distribution. (b) Absence of Autocorrelations: the auto-correlation of log returns rapidly decreases with increasing intervals. (c) Volatility Clustering: high volatility auto-correlation is observed over periods.
  • ...and 21 more figures