CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Wei Wang, Yaodong Yang, Song-Chun Zhu
TL;DR
CivRealm introduces a Civilization-inspired, turn-based, imperfect-information environment to benchmark decision-making agents on learning and reasoning under open-ended, multi-agent conditions. It provides tensor-based RL and language-based reasoning interfaces, plus a rich set of full-game and mini-game tasks to assess generalization; initial results show RL performs reasonably on mini-games but full-game progress remains challenging for both paradigms, while hierarchical LLM approaches (Mastaba) offer stronger coordination than per-unit baselines (BaseLang) but still face grounding and long-horizon planning hurdles. The work provides a new benchmark with scalable mini-games, diverse evaluation metrics, and two API modalities, enabling future RL-LLM hybrids and broader testing of generalization in complex social simulations. Overall, CivRealm highlights the gap between current AI capabilities and human-like strategic reasoning in long-horizon, multi-agent environments, and offers a platform to drive advances in both learning and reasoning components.
Abstract
The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization's profound alignment with human history and society necessitates sophisticated learning, while its ever-changing situations demand strong reasoning to generalize. Particularly, CivRealm sets up an imperfect-information general-sum game with a changing number of players; it presents a plethora of complex features, challenging the agent to deal with open-ended stochastic environments that require diplomacy and negotiation skills. Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning. To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RL- and LLM-based agents struggle to make substantial progress in the full game. Overall, CivRealm stands as a unique learning and reasoning challenge for decision-making agents. The code is available at https://github.com/bigai-ai/civrealm.
