Table of Contents
Fetching ...

Reimagining Agent-based Modeling with Large Language Model Agents via Shachi

So Kuroki, Yingtao Tian, Kou Misaki, Takashi Ikegami, Takuya Akiba, Yujin Tang

TL;DR

This work addresses the lack of principled methodology for studying emergent behaviors in LLM-driven agent-based models (ABMs). It introduces Shachi, a modular framework that decomposes an agent's policy into Configs, Memory, Tools, and an LLM-based reasoning engine, paired with a standardized agent-environment interface to enable zero-shot evaluation across diverse tasks. The authors validate Shachi on a 10-task benchmark spanning three levels of social complexity and demonstrate novel scientific inquiries, including memory transfer and living in multiple worlds, as well as establishing external validity through a tariff-shock simulation that aligns with real-world market data when the cognitive architecture is properly configured. The results show that modular cognitive components are crucial for generalization and realism, and that a principled, open-source framework can foster cumulative, scientifically grounded research in LLM-based ABM. Overall, Shachi provides a rigorous foundation for reproducible ABM with LLMs and offers practical tools for researchers to study emergent social and economic dynamics across tasks and environments.

Abstract

The study of emergent behaviors in large language model (LLM)-driven multi-agent systems is a critical research challenge, yet progress is limited by a lack of principled methodologies for controlled experimentation. To address this, we introduce Shachi, a formal methodology and modular framework that decomposes an agent's policy into core cognitive components: Configuration for intrinsic traits, Memory for contextual persistence, and Tools for expanded capabilities, all orchestrated by an LLM reasoning engine. This principled architecture moves beyond brittle, ad-hoc agent designs and enables the systematic analysis of how specific architectural choices influence collective behavior. We validate our methodology on a comprehensive 10-task benchmark and demonstrate its power through novel scientific inquiries. Critically, we establish the external validity of our approach by modeling a real-world U.S. tariff shock, showing that agent behaviors align with observed market reactions only when their cognitive architecture is appropriately configured with memory and tools. Our work provides a rigorous, open-source foundation for building and evaluating LLM agents, aimed at fostering more cumulative and scientifically grounded research.

Reimagining Agent-based Modeling with Large Language Model Agents via Shachi

TL;DR

This work addresses the lack of principled methodology for studying emergent behaviors in LLM-driven agent-based models (ABMs). It introduces Shachi, a modular framework that decomposes an agent's policy into Configs, Memory, Tools, and an LLM-based reasoning engine, paired with a standardized agent-environment interface to enable zero-shot evaluation across diverse tasks. The authors validate Shachi on a 10-task benchmark spanning three levels of social complexity and demonstrate novel scientific inquiries, including memory transfer and living in multiple worlds, as well as establishing external validity through a tariff-shock simulation that aligns with real-world market data when the cognitive architecture is properly configured. The results show that modular cognitive components are crucial for generalization and realism, and that a principled, open-source framework can foster cumulative, scientifically grounded research in LLM-based ABM. Overall, Shachi provides a rigorous foundation for reproducible ABM with LLMs and offers practical tools for researchers to study emergent social and economic dynamics across tasks and environments.

Abstract

The study of emergent behaviors in large language model (LLM)-driven multi-agent systems is a critical research challenge, yet progress is limited by a lack of principled methodologies for controlled experimentation. To address this, we introduce Shachi, a formal methodology and modular framework that decomposes an agent's policy into core cognitive components: Configuration for intrinsic traits, Memory for contextual persistence, and Tools for expanded capabilities, all orchestrated by an LLM reasoning engine. This principled architecture moves beyond brittle, ad-hoc agent designs and enables the systematic analysis of how specific architectural choices influence collective behavior. We validate our methodology on a comprehensive 10-task benchmark and demonstrate its power through novel scientific inquiries. Critically, we establish the external validity of our approach by modeling a real-world U.S. tariff shock, showing that agent behaviors align with observed market reactions only when their cognitive architecture is appropriately configured with memory and tools. Our work provides a rigorous, open-source foundation for building and evaluating LLM agents, aimed at fostering more cumulative and scientifically grounded research.

Paper Structure

This paper contains 89 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Unifying LLM-based ABM Research with Shachi. Shachi is a methodology and accompanying framework with a benchmark suite that accelerates social science research through LLM-based agents in ABM. Shachi facilitates research in this space by providing ① A unified agent architecture that standardizes core components (LLM, memory, tools, configuration) for modular and reproducible design; ② Cross-task generalization that allows extensive evaluation of different agent designs; and ③ Novel scientific inquiries previously infeasible, such as agents conducting memory transfer, living across multiple worlds, and demonstrating external validity through simulation of real-world economic events.
  • Figure 2: Shachi Methodology Overview. The figure illustrates the core principles of our methodology. Left: Agent instantiation decouples task-specific environment settings (e.g., agent profiles) from task-agnostic agent design. This ensures agent modularity and portability. Middle: The agent's policy $\pi$ is realized through a cognitive architecture of four components (Configs, Memory, Tools, and LLM). The policy $\pi$ processes an observation $O_t^i$ to generate an action $A_t^i$. The environment mediates both agent-environment interactions and inter-agent communications via structured messages and facilitates simulation. Agents receive immediate feedback via tool interfaces. Right: The methodology includes a structured three-level benchmark, enabling systematic analysis of agent behavior across contexts of increasing social complexity.
  • Figure 3: Memory-transfer-induced Differences in the CognitiveBiases Task. For each bias, the difference is calculated as the score with carry-over memory minus that with fresh memory. Statistically significant differences are indicated with star-shaped markers (paired $t$-test, $p<0.01$).
  • Figure 4: Comparison of Price Movements.
  • Figure 5: All Marcoeconomic Indicators. These are extra results accompanying those in Section \ref{['sec:swap_backends']}.
  • ...and 2 more figures