Reimagining Agent-based Modeling with Large Language Model Agents via Shachi
So Kuroki, Yingtao Tian, Kou Misaki, Takashi Ikegami, Takuya Akiba, Yujin Tang
TL;DR
This work addresses the lack of principled methodology for studying emergent behaviors in LLM-driven agent-based models (ABMs). It introduces Shachi, a modular framework that decomposes an agent's policy into Configs, Memory, Tools, and an LLM-based reasoning engine, paired with a standardized agent-environment interface to enable zero-shot evaluation across diverse tasks. The authors validate Shachi on a 10-task benchmark spanning three levels of social complexity and demonstrate novel scientific inquiries, including memory transfer and living in multiple worlds, as well as establishing external validity through a tariff-shock simulation that aligns with real-world market data when the cognitive architecture is properly configured. The results show that modular cognitive components are crucial for generalization and realism, and that a principled, open-source framework can foster cumulative, scientifically grounded research in LLM-based ABM. Overall, Shachi provides a rigorous foundation for reproducible ABM with LLMs and offers practical tools for researchers to study emergent social and economic dynamics across tasks and environments.
Abstract
The study of emergent behaviors in large language model (LLM)-driven multi-agent systems is a critical research challenge, yet progress is limited by a lack of principled methodologies for controlled experimentation. To address this, we introduce Shachi, a formal methodology and modular framework that decomposes an agent's policy into core cognitive components: Configuration for intrinsic traits, Memory for contextual persistence, and Tools for expanded capabilities, all orchestrated by an LLM reasoning engine. This principled architecture moves beyond brittle, ad-hoc agent designs and enables the systematic analysis of how specific architectural choices influence collective behavior. We validate our methodology on a comprehensive 10-task benchmark and demonstrate its power through novel scientific inquiries. Critically, we establish the external validity of our approach by modeling a real-world U.S. tariff shock, showing that agent behaviors align with observed market reactions only when their cognitive architecture is appropriately configured with memory and tools. Our work provides a rigorous, open-source foundation for building and evaluating LLM agents, aimed at fostering more cumulative and scientifically grounded research.
