Simulation-Driven Balancing of Competitive Game Levels with Reinforcement Learning
Florian Rupp, Manuel Eberhardinger, Kai Eckert
TL;DR
The paper tackles automatic balancing of tile-based, competitive two-player game levels via a domain-independent PCG approach using reinforcement learning. It introduces swap-based representations within the PCGRL framework, separating level generation from balancing to fine-tune pre-made levels toward a target balance measured by a fairness-inspired reward derived from statistical parity. Through extensive experiments in the Neural MMO environment, the approach achieves higher balanced-level percentages (up to 68%) and efficiently identifies which tile types most impact balance, outperforming hill-climbing baselines. The work also explores balancing toward different targets, asymmetries, and the applicability of fairness metrics, offering a practical, scalable method for balance-aware level design with potential extensions to other domains. Overall, the method enables rapid, data-driven balancing while providing insights into design decisions that affect competitive fairness.
Abstract
The balancing process for game levels in competitive two-player contexts involves a lot of manual work and testing, particularly for non-symmetrical game levels. In this work, we frame game balancing as a procedural content generation task and propose an architecture for automatically balancing of tile-based levels within the PCGRL framework (procedural content generation via reinforcement learning). Our architecture is divided into three parts: (1) a level generator, (2) a balancing agent, and (3) a reward modeling simulation. Through repeated simulations, the balancing agent receives rewards for adjusting the level towards a given balancing objective, such as equal win rates for all players. To this end, we propose new swap-based representations to improve the robustness of playability, thereby enabling agents to balance game levels more effectively and quickly compared to traditional PCGRL. By analyzing the agent's swapping behavior, we can infer which tile types have the most impact on the balance. We validate our approach in the Neural MMO (NMMO) environment in a competitive two-player scenario. In this extended conference paper, we present improved results, explore the applicability of the method to various forms of balancing beyond equal balancing, compare the performance to another search-based approach, and discuss the application of existing fairness metrics to game balancing.
