Table of Contents
Fetching ...

Vox Deorum: A Hybrid LLM Architecture for 4X / Grand Strategy Game AI -- Lessons from Civilization V

John Chen, Sihan Cheng, Can Gurkan, Ryan Lay, Moez Salahuddin

TL;DR

Vox Deorum presents a hybrid LLM+X architecture that assigns macro-strategic planning to a language model while delegating tactical execution to specialized subsystems, enabling robust end-to-end play in a complex 4X environment. The approach is validated on Civilization V with Vox Populi across 2,327 games, showing LLMs achieving competitive win rates and distinctive play styles relative to a strong algorithmic AI, with favorable latency and cost profiles. The work advances practical game AI by demonstrating a scalable, open-source framework and outlining design paths for long-horizon strategic reasoning, memory, and multi-agent collaboration in future research and commercial deployment.

Abstract

Large Language Models' capacity to reason in natural language makes them uniquely promising for 4X and grand strategy games, enabling more natural human-AI gameplay interactions such as collaboration and negotiation. However, these games present unique challenges due to their complexity and long-horizon nature, while latency and cost factors may hinder LLMs' real-world deployment. Working on a classic 4X strategy game, Sid Meier's Civilization V with the Vox Populi mod, we introduce Vox Deorum, a hybrid LLM+X architecture. Our layered technical design empowers LLMs to handle macro-strategic reasoning, delegating tactical execution to subsystems (e.g., algorithmic AI or reinforcement learning AI in the future). We validate our approach through 2,327 complete games, comparing two open-source LLMs with a simple prompt against Vox Populi's enhanced AI. Results show that LLMs achieve competitive end-to-end gameplay while exhibiting play styles that diverge substantially from algorithmic AI and from each other. Our work establishes a viable architecture for integrating LLMs in commercial 4X games, opening new opportunities for game design and agentic AI research.

Vox Deorum: A Hybrid LLM Architecture for 4X / Grand Strategy Game AI -- Lessons from Civilization V

TL;DR

Vox Deorum presents a hybrid LLM+X architecture that assigns macro-strategic planning to a language model while delegating tactical execution to specialized subsystems, enabling robust end-to-end play in a complex 4X environment. The approach is validated on Civilization V with Vox Populi across 2,327 games, showing LLMs achieving competitive win rates and distinctive play styles relative to a strong algorithmic AI, with favorable latency and cost profiles. The work advances practical game AI by demonstrating a scalable, open-source framework and outlining design paths for long-horizon strategic reasoning, memory, and multi-agent collaboration in future research and commercial deployment.

Abstract

Large Language Models' capacity to reason in natural language makes them uniquely promising for 4X and grand strategy games, enabling more natural human-AI gameplay interactions such as collaboration and negotiation. However, these games present unique challenges due to their complexity and long-horizon nature, while latency and cost factors may hinder LLMs' real-world deployment. Working on a classic 4X strategy game, Sid Meier's Civilization V with the Vox Populi mod, we introduce Vox Deorum, a hybrid LLM+X architecture. Our layered technical design empowers LLMs to handle macro-strategic reasoning, delegating tactical execution to subsystems (e.g., algorithmic AI or reinforcement learning AI in the future). We validate our approach through 2,327 complete games, comparing two open-source LLMs with a simple prompt against Vox Populi's enhanced AI. Results show that LLMs achieve competitive end-to-end gameplay while exhibiting play styles that diverge substantially from algorithmic AI and from each other. Our work establishes a viable architecture for integrating LLMs in commercial 4X games, opening new opportunities for game design and agentic AI research.

Paper Structure

This paper contains 25 sections, 6 figures.

Figures (6)

  • Figure 1: An overview of the Vox Deorum system, as implemented in this study.
  • Figure 2: Input token usage per turn across game progression (RQ1).
  • Figure 3: Output token usage per turn across game progression (RQ1).
  • Figure 4: Victory type distributions across conditions (RQ2).
  • Figure 5: Grand (victory) strategy adoption profiles across conditions (RQ3). For example, OSS-120B's Domination = 0.8 means 80% of its survived turns had adopted "Domination".
  • ...and 1 more figures