Table of Contents
Fetching ...

Moonshine: Distilling Game Content Generators into Steerable Generative Models

Yuhe Nie, Michael Middleton, Tim Merino, Nidhushan Kanagaraja, Ashutosh Kumar, Zhan Zhuang, Julian Togelius

TL;DR

Moonshine addresses controllability and data scarcity in procedural content generation by distilling a constructive PCG algorithm into text-conditioned generators. It creates a large synthetic, LLM-labeled dataset from the algorithm and trains two Text-to-game-Map (T2M) models, the Five-Dollar Model and a Discrete Diffusion Model, to map text prompts to discrete game maps. The Text-to-game-Map task is defined for discrete tile grids and evaluated via descriptions and maps with BLEU, ROUGE-L, METEOR, SPICE, and CLIP metrics, revealing that longer LLM-generated descriptions improve semantic fidelity while diffusion-based generation offers greater diversity. Moonshine thus demonstrates a viable distillation pathway from traditional PCG to steerable, text-conditioned generation and contributes an open dataset to advance T2M research, with recommendations for leveraging longer descriptions and choosing between FDM and DDM depending on desired diversity and fidelity.

Abstract

Procedural Content Generation via Machine Learning (PCGML) has enhanced game content creation, yet challenges in controllability and limited training data persist. This study addresses these issues by distilling a constructive PCG algorithm into a controllable PCGML model. We first generate a large amount of content with a constructive algorithm and label it using a Large Language Model (LLM). We use these synthetic labels to condition two PCGML models for content-specific generation, a diffusion model and the five-dollar model. This neural network distillation process ensures that the generation aligns with the original algorithm while introducing controllability through plain text. We define this text-conditioned PCGML as a Text-to-game-Map (T2M) task, offering an alternative to prevalent text-to-image multi-modal tasks. We compare our distilled models with the baseline constructive algorithm. Our analysis of the variety, accuracy, and quality of our generation demonstrates the efficacy of distilling constructive methods into controllable text-conditioned PCGML models.

Moonshine: Distilling Game Content Generators into Steerable Generative Models

TL;DR

Moonshine addresses controllability and data scarcity in procedural content generation by distilling a constructive PCG algorithm into text-conditioned generators. It creates a large synthetic, LLM-labeled dataset from the algorithm and trains two Text-to-game-Map (T2M) models, the Five-Dollar Model and a Discrete Diffusion Model, to map text prompts to discrete game maps. The Text-to-game-Map task is defined for discrete tile grids and evaluated via descriptions and maps with BLEU, ROUGE-L, METEOR, SPICE, and CLIP metrics, revealing that longer LLM-generated descriptions improve semantic fidelity while diffusion-based generation offers greater diversity. Moonshine thus demonstrates a viable distillation pathway from traditional PCG to steerable, text-conditioned generation and contributes an open dataset to advance T2M research, with recommendations for leveraging longer descriptions and choosing between FDM and DDM depending on desired diversity and fidelity.

Abstract

Procedural Content Generation via Machine Learning (PCGML) has enhanced game content creation, yet challenges in controllability and limited training data persist. This study addresses these issues by distilling a constructive PCG algorithm into a controllable PCGML model. We first generate a large amount of content with a constructive algorithm and label it using a Large Language Model (LLM). We use these synthetic labels to condition two PCGML models for content-specific generation, a diffusion model and the five-dollar model. This neural network distillation process ensures that the generation aligns with the original algorithm while introducing controllability through plain text. We define this text-conditioned PCGML as a Text-to-game-Map (T2M) task, offering an alternative to prevalent text-to-image multi-modal tasks. We compare our distilled models with the baseline constructive algorithm. Our analysis of the variety, accuracy, and quality of our generation demonstrates the efficacy of distilling constructive methods into controllable text-conditioned PCGML models.
Paper Structure (36 sections, 5 equations, 10 figures, 5 tables)

This paper contains 36 sections, 5 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: A visualized example of map metadata information. Independent rooms and connecting paths are colored. Room labels and directions are overlayed as text.
  • Figure 2: Five-Dollar Modelm architecture.
  • Figure 3: Discrete Diffusion Model architecture.
  • Figure 4: CLIP score of FDM (left) and DDM (right) v.s. the Brogue ground truth (X-axis).
  • Figure 5: Connectivity analysis of models based on three metrics: the number of disconnected components, fragmentation score, and largest component size. DDM has fewer disconnected components, lower fragmentation, and more stable component sizes compared to FDM.
  • ...and 5 more figures