Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments
Yi-Chun Chen, Arnav Jhala
TL;DR
The paper tackles turning narrative text into playable 2D game scenes by grounding LLM-generated frames to tile assets via semantic embeddings and affordance filtering. It segments stories into three temporal frames (beginning, middle, end) and grounds each frame with object predicates mapped to a GameTileNet ontology, then synthesizes layered terrain with Cellular Automata and rule-based placement to enforce spatial relations. Knowledge graphs support symbolic reasoning and cross-frame linking, while rendering outputs provide both visuals and semantic maps for downstream use. Evaluation across ten narratives shows stable semantic-tile alignment (Cosine ~0.41) and reasonable spatial predicate satisfaction (~72%), though affordance misclassification and tile-coverage gaps highlight areas for improvement. The framework offers a modular, extensible foundation for narrative-driven PCG, with potential extensions to data-driven frame segmentation, richer affordance reasoning, and multi-agent storytelling in games.
Abstract
Recent advances in large language models (LLMs) enable compelling story generation, but connecting narrative text to playable visual environments remains an open challenge in procedural content generation (PCG). We present a lightweight pipeline that transforms short narrative prompts into a sequence of 2D tile-based game scenes, reflecting the temporal structure of stories. Given an LLM-generated narrative, our system identifies three key time frames, extracts spatial predicates in the form of "Object-Relation-Object" triples, and retrieves visual assets using affordance-aware semantic embeddings from the GameTileNet dataset. A layered terrain is generated using Cellular Automata, and objects are placed using spatial rules grounded in the predicate structure. We evaluated our system in ten diverse stories, analyzing tile-object matching, affordance-layer alignment, and spatial constraint satisfaction across frames. This prototype offers a scalable approach to narrative-driven scene generation and lays the foundation for future work on multi-frame continuity, symbolic tracking, and multi-agent coordination in story-centered PCG.
