Emergence from Emergence: Financial Market Simulation via Learning with Heterogeneous Preferences
Ryuji Hashimoto, Ryosuke Takata, Masahiro Suzuki, Yuki Tanaka, Kiyoshi Izumi
TL;DR
This work tackles how learning and investor heterogeneity jointly shape financial markets, a gap in prior ABM studies that treated these factors separately. It introduces a multi-agent reinforcement learning framework where agents possess heterogeneous risk aversion $\alpha^j$, time discounting $\gamma^j$, and information access $\sigma^j$, and learn a shared policy within a limit-order-book environment, with traits embedded in observations and rewards. The method combines a POMDP formulation, shared-policy PPO learning, and optimal-transport calibration (OT) to align synthetic trait distributions with real data. Experiments show that trait-driven learning yields behavioral differentiation and niche formation, and that interactions among differentiated agents generate market dynamics with fat-tailed returns and volatility clustering, outperforming baselines that rely on only learning or only heterogeneity. The framework thus demonstrates emergent macro-dynamics arising from hierarchical micro- and meso-scale differentiation, offering a constructive approach to finance market modeling.
Abstract
Agent-based models help explain stock price dynamics as emergent phenomena driven by interacting investors. In this modeling tradition, investor behavior has typically been captured by two distinct mechanisms -- learning and heterogeneous preferences -- which have been explored as separate paradigms in prior studies. However, the impact of their joint modeling on the resulting collective dynamics remains largely unexplored. We develop a multi-agent reinforcement learning framework in which agents endowed with heterogeneous risk aversion, time discounting, and information access collectively learn trading strategies within a unified shared-policy framework. The experiment reveals that (i) learning with heterogeneous preferences drives agents to develop strategies aligned with their individual traits, fostering behavioral differentiation and niche specialization within the market, and (ii) the interactions by the differentiated agents are essential for the emergence of realistic market dynamics such as fat-tailed price fluctuations and volatility clustering. This study presents a constructive paradigm for financial market modeling in which the joint design of heterogeneous preferences and learning mechanisms enables two-stage emergence: individual behavior and the collective market dynamics.
