SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation
Xuhui Zhou, Zhe Su, Sophie Feng, Jiaxu Zhou, Jen-tse Huang, Hsien-Te Kao, Spencer Lynch, Svitlana Volkova, Tongshuang Sherry Wu, Anita Woolley, Hao Zhu, Maarten Sap
TL;DR
This paper addresses the challenge of conducting large-scale, human-like social simulations with LLM agents, which are typically difficult to set up and evaluate. It introduces SOTOPIA-S$^4$, a three-part system with a high-performance simulation engine, a REST API, and a web UI, enabling natural-language configuration, parallelized execution, and customizable evaluation metrics. Key contributions include a NL-based configuration workflow, an asynchronous multi-agent interaction framework with information asymmetry, a default and customizable evaluation suite powered by LLMs, and LiteLLM-based multi-LLM integration. Use cases demonstrate dyadic hiring negotiations, multiparty planning, and large-scale stress tests, showing personality effects and scalability. The system lowers barriers for social science researchers to test hypotheses and analyze LLM agent behavior at scale through a user-friendly API and web interface.
Abstract
Social simulation through large language model (LLM) agents is a promising approach to explore and validate hypotheses related to social science questions and LLM agents behavior. We present SOTOPIA-S4, a fast, flexible, and scalable social simulation system that addresses the technical barriers of current frameworks while enabling practitioners to generate multi-turn and multi-party LLM-based interactions with customizable evaluation metrics for hypothesis testing. SOTOPIA-S4 comes as a pip package that contains a simulation engine, an API server with flexible RESTful APIs for simulation management, and a web interface that enables both technical and non-technical users to design, run, and analyze simulations without programming. We demonstrate the usefulness of SOTOPIA-S4 with two use cases involving dyadic hiring negotiation and multi-party planning scenarios.
