RuleSmith: Multi-Agent LLMs for Automated Game Balancing

Ziyao Zeng; Chen Liu; Tianyu Liu; Hao Wang; Xiatao Sun; Fengyu Yang; Xiaofeng Liu; Zhiwen Fan

RuleSmith: Multi-Agent LLMs for Automated Game Balancing

Ziyao Zeng, Chen Liu, Tianyu Liu, Hao Wang, Xiatao Sun, Fengyu Yang, Xiaofeng Liu, Zhiwen Fan

TL;DR

RuleSmith addresses balancing asymmetric, rule-driven games by using two LLM agents to perform self-play from executable rulebooks and optimizing a multi-parameter rule space with Bayesian optimization. The approach treats the game rules themselves as the object of optimization, using acquisition-based adaptive sampling to allocate evaluation budget efficiently. In CivMini, RuleSmith achieves near-balanced outcomes across model configurations and provides interpretable parameter adjustments that generalize across settings. This demonstrates LLMSim as a scalable surrogate for automated design and balancing in complex multi-agent environments with potential applications beyond games.

Abstract

Game balancing is a longstanding challenge requiring repeated playtesting, expert intuition, and extensive manual tuning. We introduce RuleSmith, the first framework that achieves automated game balancing by leveraging the reasoning capabilities of multi-agent LLMs. It couples a game engine, multi-agent LLMs self-play, and Bayesian optimization operating over a multi-dimensional rule space. As a proof of concept, we instantiate RuleSmith on CivMini, a simplified civilization-style game containing heterogeneous factions, economy systems, production rules, and combat mechanics, all governed by tunable parameters. LLM agents interpret textual rulebooks and game states to generate actions, to conduct fast evaluation of balance metrics such as win-rate disparities. To search the parameter landscape efficiently, we integrate Bayesian optimization with acquisition-based adaptive sampling and discrete projection: promising candidates receive more evaluation games for accurate assessment, while exploratory candidates receive fewer games for efficient exploration. Experiments show that RuleSmith converges to highly balanced configurations and provides interpretable rule adjustments that can be directly applied to downstream game systems. Our results illustrate that LLM simulation can serve as a powerful surrogate for automating design and balancing in complex multi-agent environments.

RuleSmith: Multi-Agent LLMs for Automated Game Balancing

TL;DR

Abstract

Paper Structure (28 sections, 5 equations, 7 figures, 5 tables)

This paper contains 28 sections, 5 equations, 7 figures, 5 tables.

Introduction
Contributions.
Related Work
Game design automation.
Multi-agent self-play.
LLM agents.
Method
Parametric Asymmetric Game: CivMini
Map, factions, and units.
Actions and turn structure.
Victory and scoring.
Parameterization.
LLM Self-Play as an Evaluator
Bayesian Optimization over Rule Space
Acquisition-based adaptive sampling.
...and 13 more sections

Figures (7)

Figure 1: Overview of RuleSmith. Multi-agent LLMs perform zero-shot self-play using solely the rule book under parameterized rule sets to automatically optimize asymmetric strategy games and other rule-driven systems. This figure is generated by Nano Banana Pro.
Figure 2: Overview of the RuleSmith method. We represent an asymmetric, turn-based strategy game (CivMini) as a parameterized rule space $\theta \in \Theta$, including economy, combat, production, scoring, and game-length parameters. Given a candidate rule configuration $\theta_t$, two role-specific LLM agents (Empire and Nomads) play $N_t$ self-play games in the CivMini environment, producing a balance loss $\mathcal{L}(\theta) = |w_E - 0.5| + |w_N - 0.5| + 0.5 \cdot w_D$, where $w_E$, $w_N$, $w_D$ are Empire win rate, Nomads win rate, and draw rate. A Bayesian optimizer maintains a surrogate model $g(\theta)$ over a continuous relaxation of the rule space and selects new candidates $\tilde{\theta}_{t+1}$ by maximizing an acquisition function. The number of games $N_t$ is adaptively determined based on the Expected Improvement: promising candidates receive more games for accurate evaluation. Each continuous proposal is mapped to a valid, discrete ruleset via a deterministic discretization operator $D(\cdot)$ before evaluation. Cartoons in this figure are generated using ChatGPT-5.2.
Figure 3: Visualization of a CivMini Game. Here we visualize a rollout of an optimized balanced game in which Nomads conquered the Empire's city and won after 12 turns. Nomads tried to attack Empire's city from the bottom-right and Empire sent their soldier to defend from top-left. Two Nomads cavalries were stopped, and one was killed, but one managed to escape and arrive at the Empire's city. Then, after 2 more turns, the Empire city was destroyed, and Nomads won. LLMs for both Nomads and Empire are InternVL3.5-8B.
Figure 4: A rapid Nomad conquest victory achieved in just 6 turns. Nomad Cavalry Unit 0 executed consecutive strikes on the Empire's city from turns 3 to 6, causing 4 HP damage per turn. This aggression resulted in the destruction of the Empire's city and an rapid win. (Final score: Empire 15.8 | Nomads 27.6)
Figure 5: The visualization shows a Nomad dominance resulting in a high-score victory after 16 turns. The Nomads expanded by producing six cavalry units and overwhelmed the Empire's defense. The Empire attempted to hold a defensive line by producing three farmers and two soldiers, but they were eliminated one by one, resulting in a board dominated by Nomad cavalry. (Final score: Empire 14.2 | Nomads 67.6)
...and 2 more figures

RuleSmith: Multi-Agent LLMs for Automated Game Balancing

TL;DR

Abstract

RuleSmith: Multi-Agent LLMs for Automated Game Balancing

Authors

TL;DR

Abstract

Table of Contents

Figures (7)