Table of Contents
Fetching ...

Emergence from Emergence: Financial Market Simulation via Learning with Heterogeneous Preferences

Ryuji Hashimoto, Ryosuke Takata, Masahiro Suzuki, Yuki Tanaka, Kiyoshi Izumi

TL;DR

This work tackles how learning and investor heterogeneity jointly shape financial markets, a gap in prior ABM studies that treated these factors separately. It introduces a multi-agent reinforcement learning framework where agents possess heterogeneous risk aversion $\alpha^j$, time discounting $\gamma^j$, and information access $\sigma^j$, and learn a shared policy within a limit-order-book environment, with traits embedded in observations and rewards. The method combines a POMDP formulation, shared-policy PPO learning, and optimal-transport calibration (OT) to align synthetic trait distributions with real data. Experiments show that trait-driven learning yields behavioral differentiation and niche formation, and that interactions among differentiated agents generate market dynamics with fat-tailed returns and volatility clustering, outperforming baselines that rely on only learning or only heterogeneity. The framework thus demonstrates emergent macro-dynamics arising from hierarchical micro- and meso-scale differentiation, offering a constructive approach to finance market modeling.

Abstract

Agent-based models help explain stock price dynamics as emergent phenomena driven by interacting investors. In this modeling tradition, investor behavior has typically been captured by two distinct mechanisms -- learning and heterogeneous preferences -- which have been explored as separate paradigms in prior studies. However, the impact of their joint modeling on the resulting collective dynamics remains largely unexplored. We develop a multi-agent reinforcement learning framework in which agents endowed with heterogeneous risk aversion, time discounting, and information access collectively learn trading strategies within a unified shared-policy framework. The experiment reveals that (i) learning with heterogeneous preferences drives agents to develop strategies aligned with their individual traits, fostering behavioral differentiation and niche specialization within the market, and (ii) the interactions by the differentiated agents are essential for the emergence of realistic market dynamics such as fat-tailed price fluctuations and volatility clustering. This study presents a constructive paradigm for financial market modeling in which the joint design of heterogeneous preferences and learning mechanisms enables two-stage emergence: individual behavior and the collective market dynamics.

Emergence from Emergence: Financial Market Simulation via Learning with Heterogeneous Preferences

TL;DR

This work tackles how learning and investor heterogeneity jointly shape financial markets, a gap in prior ABM studies that treated these factors separately. It introduces a multi-agent reinforcement learning framework where agents possess heterogeneous risk aversion , time discounting , and information access , and learn a shared policy within a limit-order-book environment, with traits embedded in observations and rewards. The method combines a POMDP formulation, shared-policy PPO learning, and optimal-transport calibration (OT) to align synthetic trait distributions with real data. Experiments show that trait-driven learning yields behavioral differentiation and niche formation, and that interactions among differentiated agents generate market dynamics with fat-tailed returns and volatility clustering, outperforming baselines that rely on only learning or only heterogeneity. The framework thus demonstrates emergent macro-dynamics arising from hierarchical micro- and meso-scale differentiation, offering a constructive approach to finance market modeling.

Abstract

Agent-based models help explain stock price dynamics as emergent phenomena driven by interacting investors. In this modeling tradition, investor behavior has typically been captured by two distinct mechanisms -- learning and heterogeneous preferences -- which have been explored as separate paradigms in prior studies. However, the impact of their joint modeling on the resulting collective dynamics remains largely unexplored. We develop a multi-agent reinforcement learning framework in which agents endowed with heterogeneous risk aversion, time discounting, and information access collectively learn trading strategies within a unified shared-policy framework. The experiment reveals that (i) learning with heterogeneous preferences drives agents to develop strategies aligned with their individual traits, fostering behavioral differentiation and niche specialization within the market, and (ii) the interactions by the differentiated agents are essential for the emergence of realistic market dynamics such as fat-tailed price fluctuations and volatility clustering. This study presents a constructive paradigm for financial market modeling in which the joint design of heterogeneous preferences and learning mechanisms enables two-stage emergence: individual behavior and the collective market dynamics.

Paper Structure

This paper contains 20 sections, 14 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Conceptual diagram of our MARL-based ABM for financial market simulations. Each agent is assigned individual traits when the simulation starts. A shared-policy, learned to satisfy each agent's specific preference, governs agent behavior within a LOB market environment. The prior distribution of the agents' trait factors are calibrated using OT so that the synthetic price series aligns with real data.
  • Figure 2: Structure of the simulation. At each time step $t$, one of the $n$ agents is selected to submit an order. Their order is placed into a LOB market, which matches buy and sell orders to determine transactions and update the market state.
  • Figure 3: Heatmap of the scaled order volume $\tilde{v}_t^j$ derived from the obtained policy, with varying rescaled volatility $V_{[t_{i-1}^j, t_i^j]}$ and risk aversion term $\alpha^j$.
  • Figure 4: Scatter plot of agent action vectors from five simulations, with color coding for each quadrant. The size of each point represents the agent's discount factor $\gamma^j$.
  • Figure 5: Box plot of agents’ trading performance over five runs. Agents are grouped by uninformedness $\sigma^j$ quartiles (qquartile(%)) on the horizontal axis, and vertical axis shows log returns. Boxes indicate medians and 90% ranges, showing the distribution of each trading performance.
  • ...and 2 more figures