Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation
Zeping Li, Guancheng Wan, Keyang Chen, Yu Chen, Yiwen Zhao, Philip Torr, Guangnan Ye, Zhenfei Yin, Hongfeng Chai
TL;DR
This work tests whether LLM-based agents in stock-market ABMs exhibit style-switching behavior aligned with behavioral-finance theory. By embedding four drivers—loss aversion, herding, wealth differentiation, and price misalignment—as persistent traits and prompting agents to switch styles every 10 days, the authors evaluate alignment using four metrics and Mann–Whitney U tests across multiple backbones. The results show partial alignment: loss aversion and, in one model, wealth differentiation are associated with the predicted switching tendencies, while herding and mispricing effects are not consistently observed across models. The study provides a concrete framework for validating behavioral fidelity in LLM agents, highlights model-specific dynamics, and suggests future work on richer market interactions and trait integration to better replica human-like trading behavior.
Abstract
Recent works have increasingly applied Large Language Models (LLMs) as agents in financial stock market simulations to test if micro-level behaviors aggregate into macro-level phenomena. However, a crucial question arises: Do LLM agents' behaviors align with real market participants? This alignment is key to the validity of simulation results. To explore this, we select a financial stock market scenario to test behavioral consistency. Investors are typically classified as fundamental or technical traders, but most simulations fix strategies at initialization, failing to reflect real-world trading dynamics. In this work, we assess whether agents' strategy switching aligns with financial theory, providing a framework for this evaluation. We operationalize four behavioral-finance drivers-loss aversion, herding, wealth differentiation, and price misalignment-as personality traits set via prompting and stored long-term. In year-long simulations, agents process daily price-volume data, trade under a designated style, and reassess their strategy every 10 trading days. We introduce four alignment metrics and use Mann-Whitney U tests to compare agents' style-switching behavior with financial theory. Our results show that recent LLMs' switching behavior is only partially consistent with behavioral-finance theories, highlighting the need for further refinement in aligning agent behavior with financial theory.
