Table of Contents
Fetching ...

Video Game Level Design as a Multi-Agent Reinforcement Learning Problem

Sam Earle, Zehua Jiang, Eugene Vinitsky, Julian Togelius

TL;DR

This work tackles the bottleneck in Procedural Content Generation via Reinforcement Learning (PCGRL), where global reward computations scale as $O(N^2)$ with map size $N$, and single-agent generators struggle to generalize to out-of-distribution maps. It extends PCGRL to a multi-agent setting trained with Multi-Agent PPO (MAPPO) on GPU using GPU-accelerated JAX, employing shared rewards and local egocentric observations, and introducing the board_scans unit and reward_frequency hyperparameter. Results show that multi-agent collaboration maintains or improves in-distribution performance while significantly boosting generalization to larger and differently shaped maps, due to learning more local, modular policies and reducing reward computation costs. The approach demonstrates scalable, co-creative level generation for grid-based domains and suggests pathways for specialization within automated level-design teams, with open-source code to enable broader adoption.

Abstract

Procedural Content Generation via Reinforcement Learning (PCGRL) offers a method for training controllable level designer agents without the need for human datasets, using metrics that serve as proxies for level quality as rewards. Existing PCGRL research focuses on single generator agents, but are bottlenecked by the need to frequently recalculate heuristics of level quality and the agent's need to navigate around potentially large maps. By framing level generation as a multi-agent problem, we mitigate the efficiency bottleneck of single-agent PCGRL by reducing the number of reward calculations relative to the number of agent actions. We also find that multi-agent level generators are better able to generalize to out-of-distribution map shapes, which we argue is due to the generators' learning more local, modular design policies. We conclude that treating content generation as a distributed, multi-agent task is beneficial for generating functional artifacts at scale.

Video Game Level Design as a Multi-Agent Reinforcement Learning Problem

TL;DR

This work tackles the bottleneck in Procedural Content Generation via Reinforcement Learning (PCGRL), where global reward computations scale as with map size , and single-agent generators struggle to generalize to out-of-distribution maps. It extends PCGRL to a multi-agent setting trained with Multi-Agent PPO (MAPPO) on GPU using GPU-accelerated JAX, employing shared rewards and local egocentric observations, and introducing the board_scans unit and reward_frequency hyperparameter. Results show that multi-agent collaboration maintains or improves in-distribution performance while significantly boosting generalization to larger and differently shaped maps, due to learning more local, modular policies and reducing reward computation costs. The approach demonstrates scalable, co-creative level generation for grid-based domains and suggests pathways for specialization within automated level-design teams, with open-source code to enable broader adoption.

Abstract

Procedural Content Generation via Reinforcement Learning (PCGRL) offers a method for training controllable level designer agents without the need for human datasets, using metrics that serve as proxies for level quality as rewards. Existing PCGRL research focuses on single generator agents, but are bottlenecked by the need to frequently recalculate heuristics of level quality and the agent's need to navigate around potentially large maps. By framing level generation as a multi-agent problem, we mitigate the efficiency bottleneck of single-agent PCGRL by reducing the number of reward calculations relative to the number of agent actions. We also find that multi-agent level generators are better able to generalize to out-of-distribution map shapes, which we argue is due to the generators' learning more local, modular design policies. We conclude that treating content generation as a distributed, multi-agent task is beneficial for generating functional artifacts at scale.

Paper Structure

This paper contains 18 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: In multi-agent PCGRL, agent actions are taken in parallel, reducing the number of reward computations required relative to level edits, while maintaining a per-agent dense reward scheme, and fostering collaboration between agents. Reward $R_t$ is computed according the weighted sum of heuristic scores $m_i$.
  • Figure 2: Episode rollouts with variable numbers of agents. Having multiple agents tends to allow for better map coverage and more complex and modular design patterns. For the sake of illustration, levels are initialized full of wall tiles. $t$ represents the timestep of a given frame.
  • Figure 3: Episode returns by total number of environment steps during training, for varying numbers of agents. Results averaged over 10 training seeds.
  • Figure 4: The result of three agents' collaboratively building a binary maze. The agents, trained on $16 \times 16$ maps, are evaluated on an out-of-distribution map of size $32 \times 32$.
  • Figure 5: Agents generate a level in the dungeon domain.