Moonshine: Distilling Game Content Generators into Steerable Generative Models
Yuhe Nie, Michael Middleton, Tim Merino, Nidhushan Kanagaraja, Ashutosh Kumar, Zhan Zhuang, Julian Togelius
TL;DR
Moonshine addresses controllability and data scarcity in procedural content generation by distilling a constructive PCG algorithm into text-conditioned generators. It creates a large synthetic, LLM-labeled dataset from the algorithm and trains two Text-to-game-Map (T2M) models, the Five-Dollar Model and a Discrete Diffusion Model, to map text prompts to discrete game maps. The Text-to-game-Map task is defined for discrete tile grids and evaluated via descriptions and maps with BLEU, ROUGE-L, METEOR, SPICE, and CLIP metrics, revealing that longer LLM-generated descriptions improve semantic fidelity while diffusion-based generation offers greater diversity. Moonshine thus demonstrates a viable distillation pathway from traditional PCG to steerable, text-conditioned generation and contributes an open dataset to advance T2M research, with recommendations for leveraging longer descriptions and choosing between FDM and DDM depending on desired diversity and fidelity.
Abstract
Procedural Content Generation via Machine Learning (PCGML) has enhanced game content creation, yet challenges in controllability and limited training data persist. This study addresses these issues by distilling a constructive PCG algorithm into a controllable PCGML model. We first generate a large amount of content with a constructive algorithm and label it using a Large Language Model (LLM). We use these synthetic labels to condition two PCGML models for content-specific generation, a diffusion model and the five-dollar model. This neural network distillation process ensures that the generation aligns with the original algorithm while introducing controllability through plain text. We define this text-conditioned PCGML as a Text-to-game-Map (T2M) task, offering an alternative to prevalent text-to-image multi-modal tasks. We compare our distilled models with the baseline constructive algorithm. Our analysis of the variety, accuracy, and quality of our generation demonstrates the efficacy of distilling constructive methods into controllable text-conditioned PCGML models.
