Table of Contents
Fetching ...

Multiverse: Language-Conditioned Multi-Game Level Blending via Shared Representation

In-Chang Baek, Jiyun Jung, Geum-Hwan Hwang, Sung-Hyun Kim, Kyung-Joong Kim

Abstract

Text-to-level generation aims to translate natural language descriptions into structured game levels, enabling intuitive control over procedural content generation. While prior text-to-level generators are typically limited to a single game domain, extending language-conditioned generation to multiple games requires learning representations that capture structural relationships across domains. We propose Multiverse, a language-conditioned multi-game level generator that enables cross-game level blending through textual specifications. The model learns a shared latent space aligning textual instructions and level structures, while a threshold-based multi-positive contrastive supervision links semantically related levels across games. This representation allows language to guide which structural characteristics should be preserved when combining content from different games, enabling controllable blending through latent interpolation and zero-shot generation from compositional textual prompts. Experiments show that the learned representation supports controllable cross-game level blending and significantly improves blending quality within the same game genre, while providing a unified representation for language-conditioned multi-game content generation.

Multiverse: Language-Conditioned Multi-Game Level Blending via Shared Representation

Abstract

Text-to-level generation aims to translate natural language descriptions into structured game levels, enabling intuitive control over procedural content generation. While prior text-to-level generators are typically limited to a single game domain, extending language-conditioned generation to multiple games requires learning representations that capture structural relationships across domains. We propose Multiverse, a language-conditioned multi-game level generator that enables cross-game level blending through textual specifications. The model learns a shared latent space aligning textual instructions and level structures, while a threshold-based multi-positive contrastive supervision links semantically related levels across games. This representation allows language to guide which structural characteristics should be preserved when combining content from different games, enabling controllable blending through latent interpolation and zero-shot generation from compositional textual prompts. Experiments show that the learned representation supports controllable cross-game level blending and significantly improves blending quality within the same game genre, while providing a unified representation for language-conditioned multi-game content generation.

Paper Structure

This paper contains 19 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Shared Latent Space Visualization. The proposed inter-game contrastive learning learns 128-dimensional representations for levels and instructions across multiple games, which are projected into three dimensions using t-SNE. Embeddings from different games are aligned within a unified latent space. The blended sample is generated by an interpolation between Dungeon and Lode Runner level embeddings, demonstrating cross-game continuity in the shared space.
  • Figure 2: Multiverse Training Pipeline and Level Blending Mechanism. (a) A shared multi-game latent space is learned using multi-positive contrastive learning over text--level pairs. (b) The learned level embedding is used as a conditional vector for a VAE-based generator. For single-game generation, the encoded embedding directly conditions the decoder. For multi-game level blending, embeddings from different games are interpolated in the latent space, and the interpolated vector is provided to the decoder to synthesize blended levels.
  • Figure 3: Embedding Interpolation-based Level Blending. Levels generated by interpolating between the embeddings of Super Mario (A) and Lode Runner (B) across different mixing ratios.
  • Figure 4: Multi-game Text Instruction-based Level Blending. Levels generated by composite text instruction between the embeddings of Super Mario (A) and Lode Runner (B) across different instruction combination strategies.
  • Figure 5: Cosine Similarity Distribution of Composite Instructions. 2D kernel density estimation visualization of cosine similarity between composite instructions and the original instructions from game A and game B. Balanced combinations remain relatively symmetric, while biased combinations shift toward the axis corresponding to the base instruction.