Table of Contents
Fetching ...

GameTileNet: A Semantic Dataset for Low-Resolution Game Art in Procedural Content Generation

Yi-Chun Chen, Arnav Jhala

TL;DR

GameTileNet addresses the lack of semantic labeling for low-resolution game tiles to support narrative-driven PCG. It introduces a dataset of 2,142 labeled objects across 67 tilesets with a hierarchical labeling schema and an end-to-end annotation pipeline, including an automatic adjacency-based segmentation and CLIP-based affordance prediction. The work demonstrates how upscaling and vision-language models improve semantic understanding of pixel art and shows a narrative-to-scene generation pipeline using cellular automata terrain, semantic matching, knowledge graphs, and rule-based placement. This dataset provides a baseline for object detection in non-photorealistic, low-resolution art and enables scalable, narrative-grounded PCG pipelines.

Abstract

GameTileNet is a dataset designed to provide semantic labels for low-resolution digital game art, advancing procedural content generation (PCG) and related AI research as a vision-language alignment task. Large Language Models (LLMs) and image-generative AI models have enabled indie developers to create visual assets, such as sprites, for game interactions. However, generating visuals that align with game narratives remains challenging due to inconsistent AI outputs, requiring manual adjustments by human artists. The diversity of visual representations in automatically generated game content is also limited because of the imbalance in distributions across styles for training data. GameTileNet addresses this by collecting artist-created game tiles from OpenGameArt.org under Creative Commons licenses and providing semantic annotations to support narrative-driven content generation. The dataset introduces a pipeline for object detection in low-resolution tile-based game art (e.g., 32x32 pixels) and annotates semantics, connectivity, and object classifications. GameTileNet is a valuable resource for improving PCG methods, supporting narrative-rich game content, and establishing a baseline for object detection in low-resolution, non-photorealistic images. TL;DR: GameTileNet is a semantic dataset of low-resolution game tiles designed to support narrative-driven procedural content generation through visual-language alignment.

GameTileNet: A Semantic Dataset for Low-Resolution Game Art in Procedural Content Generation

TL;DR

GameTileNet addresses the lack of semantic labeling for low-resolution game tiles to support narrative-driven PCG. It introduces a dataset of 2,142 labeled objects across 67 tilesets with a hierarchical labeling schema and an end-to-end annotation pipeline, including an automatic adjacency-based segmentation and CLIP-based affordance prediction. The work demonstrates how upscaling and vision-language models improve semantic understanding of pixel art and shows a narrative-to-scene generation pipeline using cellular automata terrain, semantic matching, knowledge graphs, and rule-based placement. This dataset provides a baseline for object detection in non-photorealistic, low-resolution art and enables scalable, narrative-grounded PCG pipelines.

Abstract

GameTileNet is a dataset designed to provide semantic labels for low-resolution digital game art, advancing procedural content generation (PCG) and related AI research as a vision-language alignment task. Large Language Models (LLMs) and image-generative AI models have enabled indie developers to create visual assets, such as sprites, for game interactions. However, generating visuals that align with game narratives remains challenging due to inconsistent AI outputs, requiring manual adjustments by human artists. The diversity of visual representations in automatically generated game content is also limited because of the imbalance in distributions across styles for training data. GameTileNet addresses this by collecting artist-created game tiles from OpenGameArt.org under Creative Commons licenses and providing semantic annotations to support narrative-driven content generation. The dataset introduces a pipeline for object detection in low-resolution tile-based game art (e.g., 32x32 pixels) and annotates semantics, connectivity, and object classifications. GameTileNet is a valuable resource for improving PCG methods, supporting narrative-rich game content, and establishing a baseline for object detection in low-resolution, non-photorealistic images. TL;DR: GameTileNet is a semantic dataset of low-resolution game tiles designed to support narrative-driven procedural content generation through visual-language alignment.

Paper Structure

This paper contains 46 sections, 1 equation, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Object processing pipeline for pixel art game tiles. Each step supports accurate detection and labeling of low-resolution game elements.
  • Figure 2: Distribution and semantic structure of object categories in GameTileNet. (a) Group label frequencies, limited to labels with more than 10 occurrences. (b) Supercategories over the full set; colored regions show dataset-wide coverage.
  • Figure 3: Affordance label distributions across annotated object tiles. Left: Unique label combinations show the diversity of functional roles. Right: Individual label counts reveal frequent co-occurrence among affordances, except for Characters, which rarely overlaps.
  • Figure 4: Training and validation accuracy across epochs for the completeness classification task (y-axis: accuracy, x-axis: epochs).
  • Figure 5: Percentage of caption–label matches across different upscaling methods. Each bar shows the proportion of matches between BLIP-generated captions and author-annotated labels: Group Labels, Supercategories, and Affordance Labels. We measure three types of matches— Direct Match, Synonym Match, and Semantic Similarity— using lexical and embedding-based comparisons. SwinIR and Real-ESRGAN upscaled images show notably better alignment with captions, especially in semantic terms.