The Procedural Content Generation Benchmark: An Open-source Testbed for Generative Challenges in Games
Ahmed Khalifa, Roberto Gallotta, Matthew Barthet, Antonios Liapis, Julian Togelius, Georgios N. Yannakakis
TL;DR
The paper presents the Procedural Content Generation Benchmark, an open-source testbed to standardize evaluation of generative algorithms across diverse game content tasks. It formalizes three evaluation axes—quality, diversity, and controllability—and implements a modular, OpenAI Gym–like framework with 12 distinct PCG problems and problem-specific representations. Baseline search-based generators (Random, μ+λ Evolutionary Strategy, and Genetic Algorithm) are evaluated across all problems to illustrate task difficulty and the impact of fitness targets on feasible, controllable, and diverse artifacts. The benchmark enables rigorous, reproducible comparisons, supports education and experimentation, and serves as a foundation for future extensions, including more complex problems and integration with modern AI generators.
Abstract
This paper introduces the Procedural Content Generation Benchmark for evaluating generative algorithms on different game content creation tasks. The benchmark comes with 12 game-related problems with multiple variants on each problem. Problems vary from creating levels of different kinds to creating rule sets for simple arcade games. Each problem has its own content representation, control parameters, and evaluation metrics for quality, diversity, and controllability. This benchmark is intended as a first step towards a standardized way of comparing generative algorithms. We use the benchmark to score three baseline algorithms: a random generator, an evolution strategy, and a genetic algorithm. Results show that some problems are easier to solve than others, as well as the impact the chosen objective has on quality, diversity, and controllability of the generated artifacts.
