CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Siyuan Qi; Shuo Chen; Yexin Li; Xiangyu Kong; Junqi Wang; Bangcheng Yang; Pring Wong; Yifan Zhong; Xiaoyuan Zhang; Zhaowei Zhang; Nian Liu; Wei Wang; Yaodong Yang; Song-Chun Zhu

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Wei Wang, Yaodong Yang, Song-Chun Zhu

TL;DR

CivRealm introduces a Civilization-inspired, turn-based, imperfect-information environment to benchmark decision-making agents on learning and reasoning under open-ended, multi-agent conditions. It provides tensor-based RL and language-based reasoning interfaces, plus a rich set of full-game and mini-game tasks to assess generalization; initial results show RL performs reasonably on mini-games but full-game progress remains challenging for both paradigms, while hierarchical LLM approaches (Mastaba) offer stronger coordination than per-unit baselines (BaseLang) but still face grounding and long-horizon planning hurdles. The work provides a new benchmark with scalable mini-games, diverse evaluation metrics, and two API modalities, enabling future RL-LLM hybrids and broader testing of generalization in complex social simulations. Overall, CivRealm highlights the gap between current AI capabilities and human-like strategic reasoning in long-horizon, multi-agent environments, and offers a platform to drive advances in both learning and reasoning components.

Abstract

The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization's profound alignment with human history and society necessitates sophisticated learning, while its ever-changing situations demand strong reasoning to generalize. Particularly, CivRealm sets up an imperfect-information general-sum game with a changing number of players; it presents a plethora of complex features, challenging the agent to deal with open-ended stochastic environments that require diplomacy and negotiation skills. Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning. To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RL- and LLM-based agents struggle to make substantial progress in the full game. Overall, CivRealm stands as a unique learning and reasoning challenge for decision-making agents. The code is available at https://github.com/bigai-ai/civrealm.

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

TL;DR

Abstract

Paper Structure (39 sections, 25 figures, 11 tables)

This paper contains 39 sections, 25 figures, 11 tables.

Introduction
Related Work
Environment
Full Game Description
Mini-game Benchmarks
Methods
Tensor-based Reinforcement Learning
BaseLang: Baseline Language-based Agent
Mastaba: Enhancing BaseLang by a Hierarchical Structure
Experiments
Tensor-based Reinforcement Learning
Language-based Agents: BaseLang and Mastaba
Conclusion
Environment
More on Full Game and CivRealm Features
...and 24 more sections

Figures (25)

Figure 1: The gameplay of Civilization fcivnet requires deep reasoning, involving long-term strategic planning and fine-grained tactical controls. The figure depicts a hypothetical situation that resembles a historical scenario, with the Romans securing Sicily against Carthage while nurturing a friendly diplomatic relationship with a declining Egypt. This decision is made in a highly complex context: players need to consider various aspects of long-term developmental strategies like technology, military, and diplomacy in the given geographical and diplomatic context. They also engage in fine-grained control actions, e.g., border exploration, vessel building, and road construction.
Figure 2: Civilization evolves as the game unfolds, and the potential state and action space explode. This figure focuses on 4 of the 8 ages, wherein technological advancements unlock a greater number of buildings and units. Throughout the course of the game, the state can grow from $10^{15}$ to $10^{650}$, and the action space can expand from $10^4$ to $10^{166}$ (\ref{['app_sec:space_estimation']}). This figure only shows some example elements; the full game includes 87 types of technologies, 68 types of buildings, 52 types of units, 6 government types, and 5 diplomatic states, all subject to the rule sets used and are customizable.
Figure 3: Examples of different types of the designed mini-games.
Figure 4: The generated mini-games are diverse and balanced. (a) Terrain and resource distribution and corresponding joint food/production/trade metrics. (b) Unit numbers and strength: inter-player ratio's distribution. (c) Technology number and value: inter-player difference's distribution.
Figure 5: Architecture of LLM-based approaches. Left: BaseLang. Right: Mastaba.
...and 20 more figures

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

TL;DR

Abstract

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (25)