PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Sam Earle; Zehua Jiang; Julian Togelius

PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Sam Earle, Zehua Jiang, Julian Togelius

TL;DR

This work scales PCGRL by reimplementing its environments in JAX to exploit GPU parallelism, achieving over $15\times$ training speedups and enabling simulations up to $10^9$ timesteps. It introduces randomized map shapes and fixed pinpoint tiles to enhance controllability and mitigate overfitting, and systematically studies how observation size affects generalization. Across binary, maze, and dungeon domains, the results show that smaller, local observations improve out-of-distribution generalization while still performing well in-distribution, with randomized shapes further promoting robust strategies. The approach offers a practical, scalable benchmark for RL-based level generators with tangible benefits for real-world designer workflows.

Abstract

Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained based only on a set of computable metrics acting as a proxy for the level's quality and key characteristics. While PCGRL offers a unique set of affordances for game designers, it is constrained by the compute-intensive process of training RL agents, and has so far been limited to generating relatively small levels. To address this issue of scale, we implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU, resulting in faster environment simulation; removing the CPU-GPU transfer of information bottleneck during RL training; and ultimately resulting in significantly improved training speed. We replicate several key results from prior works in this new framework, letting models train for much longer than previously studied, and evaluating their behavior after 1 billion timesteps. Aiming for greater control for human designers, we introduce randomized level sizes and frozen "pinpoints" of pivotal game tiles as further ways of countering overfitting. To test the generalization ability of learned generators, we evaluate models on large, out-of-distribution map sizes, and find that partial observation sizes learn more robust design strategies.

PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

TL;DR

This work scales PCGRL by reimplementing its environments in JAX to exploit GPU parallelism, achieving over

training speedups and enabling simulations up to

timesteps. It introduces randomized map shapes and fixed pinpoint tiles to enhance controllability and mitigate overfitting, and systematically studies how observation size affects generalization. Across binary, maze, and dungeon domains, the results show that smaller, local observations improve out-of-distribution generalization while still performing well in-distribution, with randomized shapes further promoting robust strategies. The approach offers a practical, scalable benchmark for RL-based level generators with tangible benefits for real-world designer workflows.

Abstract

Paper Structure (17 sections, 7 figures, 6 tables)

This paper contains 17 sections, 7 figures, 6 tables.

Introduction
Background
Methods
Training
Task
Binary domain
Maze domain
Dungeon domain
Pinpoint tiles
Randomized map shapes
Jax implementation
Results
Speed Comparison
Observation size
Randomized map shapes during training
...and 2 more sections

Figures (7)

Figure 1: Evaluation in the maze domain with pinpoints (randomly fixed player and door tiles). While models with large global observations are better on small $16\times 16$ in-distribution maps, models with smaller local observations learn scalable patterns that generalize better to larger $32\times 32$ maps.
Figure 2: Model architectures.
Figure 3: Reward curve of the Conv model on the maze domain with pinpoints (randomly frozen player and door). On a more challenging task involving randomized per-episode map shapes, the performance gap between models with global and partial observations shrinks.
Figure 4: On the dungeon domain with controllable path length the Conv model with $3\times 3$ observation generalizes a design pattern to larger map sizes.
Figure 5: Effect of different observation sizes on reward curves during training of the Conv model on the dungeon domain with control targets.
...and 2 more figures

PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

TL;DR

Abstract

PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Authors

TL;DR

Abstract

Table of Contents

Figures (7)