Video Game Level Design as a Multi-Agent Reinforcement Learning Problem
Sam Earle, Zehua Jiang, Eugene Vinitsky, Julian Togelius
TL;DR
This work tackles the bottleneck in Procedural Content Generation via Reinforcement Learning (PCGRL), where global reward computations scale as $O(N^2)$ with map size $N$, and single-agent generators struggle to generalize to out-of-distribution maps. It extends PCGRL to a multi-agent setting trained with Multi-Agent PPO (MAPPO) on GPU using GPU-accelerated JAX, employing shared rewards and local egocentric observations, and introducing the board_scans unit and reward_frequency hyperparameter. Results show that multi-agent collaboration maintains or improves in-distribution performance while significantly boosting generalization to larger and differently shaped maps, due to learning more local, modular policies and reducing reward computation costs. The approach demonstrates scalable, co-creative level generation for grid-based domains and suggests pathways for specialization within automated level-design teams, with open-source code to enable broader adoption.
Abstract
Procedural Content Generation via Reinforcement Learning (PCGRL) offers a method for training controllable level designer agents without the need for human datasets, using metrics that serve as proxies for level quality as rewards. Existing PCGRL research focuses on single generator agents, but are bottlenecked by the need to frequently recalculate heuristics of level quality and the agent's need to navigate around potentially large maps. By framing level generation as a multi-agent problem, we mitigate the efficiency bottleneck of single-agent PCGRL by reducing the number of reward calculations relative to the number of agent actions. We also find that multi-agent level generators are better able to generalize to out-of-distribution map shapes, which we argue is due to the generators' learning more local, modular design policies. We conclude that treating content generation as a distributed, multi-agent task is beneficial for generating functional artifacts at scale.
