Table of Contents
Fetching ...

Model as a Game: On Numerical and Spatial Consistency for Generative Games

Jingye Chen, Yuzhong Zhao, Yupan Huang, Lei Cui, Li Dong, Tengchao Lv, Qifeng Chen, Furu Wei

TL;DR

This work addresses the challenge of maintaining numerical and spatial consistency in generative games by introducing MaaG, a framework that augments a diffusion-based game generator with two specialized modules. The Numerical Module uses a trainable LogicNet and an external numeric record to ensure event-triggered value changes align with gameplay, while the Spatial Module maintains an external map to preserve spatial continuity across revisits, using retrieval and sliding-window linking. Empirical results across Traveler, Pong, and Pac-Man show substantial gains in numerical consistency (NumCon) and spatial consistency (SpaCon) with modest inference overhead, along with qualitative improvements in map coherence and score rendering accuracy. The approach demonstrates practical potential for consistency-aware generative game creation, offering map customization and reliable digit-based score rendering, and sets the stage for extending to more complex 2D/3D environments.

Abstract

Recent advances in generative models have significantly impacted game generation. However, despite producing high-quality graphics and adequately receiving player input, existing models often fail to maintain fundamental game properties such as numerical and spatial consistency. Numerical consistency ensures gameplay mechanics correctly reflect score changes and other quantitative elements, while spatial consistency prevents jarring scene transitions, providing seamless player experiences. In this paper, we revisit the paradigm of generative games to explore what truly constitutes a Model as a Game (MaaG) with a well-developed mechanism. We begin with an empirical study on ``Traveler'', a 2D game created by an LLM featuring minimalist rules yet challenging generative models in maintaining consistency. Based on the DiT architecture, we design two specialized modules: (1) a numerical module that integrates a LogicNet to determine event triggers, with calculations processed externally as conditions for image generation; and (2) a spatial module that maintains a map of explored areas, retrieving location-specific information during generation and linking new observations to ensure continuity. Experiments across three games demonstrate that our integrated modules significantly enhance performance on consistency metrics compared to baselines, while incurring minimal time overhead during inference.

Model as a Game: On Numerical and Spatial Consistency for Generative Games

TL;DR

This work addresses the challenge of maintaining numerical and spatial consistency in generative games by introducing MaaG, a framework that augments a diffusion-based game generator with two specialized modules. The Numerical Module uses a trainable LogicNet and an external numeric record to ensure event-triggered value changes align with gameplay, while the Spatial Module maintains an external map to preserve spatial continuity across revisits, using retrieval and sliding-window linking. Empirical results across Traveler, Pong, and Pac-Man show substantial gains in numerical consistency (NumCon) and spatial consistency (SpaCon) with modest inference overhead, along with qualitative improvements in map coherence and score rendering accuracy. The approach demonstrates practical potential for consistency-aware generative game creation, offering map customization and reliable digit-based score rendering, and sets the stage for extending to more complex 2D/3D environments.

Abstract

Recent advances in generative models have significantly impacted game generation. However, despite producing high-quality graphics and adequately receiving player input, existing models often fail to maintain fundamental game properties such as numerical and spatial consistency. Numerical consistency ensures gameplay mechanics correctly reflect score changes and other quantitative elements, while spatial consistency prevents jarring scene transitions, providing seamless player experiences. In this paper, we revisit the paradigm of generative games to explore what truly constitutes a Model as a Game (MaaG) with a well-developed mechanism. We begin with an empirical study on ``Traveler'', a 2D game created by an LLM featuring minimalist rules yet challenging generative models in maintaining consistency. Based on the DiT architecture, we design two specialized modules: (1) a numerical module that integrates a LogicNet to determine event triggers, with calculations processed externally as conditions for image generation; and (2) a spatial module that maintains a map of explored areas, retrieving location-specific information during generation and linking new observations to ensure continuity. Experiments across three games demonstrate that our integrated modules significantly enhance performance on consistency metrics compared to baselines, while incurring minimal time overhead during inference.

Paper Structure

This paper contains 24 sections, 2 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Demonstration of numerical and spatial consistency in our generative games. We train an action-controllable image generative model for Traveler (top) and extends to other games Pong and Pac-Man (bottom). Action sequences and rules are below each gameplay.
  • Figure 2: While generative games have advanced creative scene generation, significant challenges remain in maintaining numerical and spatial consistency. In Oasis, simply looking up and then down can lead to numerical inconsistencies and spatial discontinuities in the generated environment.
  • Figure 3: Overall architecture for enhancing the consistency of generative games. The black arrow and components signify the baseline architecture, while the red and blue arrows and components represent our proposed numerical and spatial modules, respectively. The numerical module utilizes a learnable LogicNet to determine the occurrence of events, and the values calculated by an external logic calculator are then used as conditions for the Diffusion Transformer (DiT) generation. The spatial module maintains an external map that is used to retrieve an extended local map, serving as auxiliary information for generation. New observations are linked to the map for subsequent frames. For example, since a new pink building is added, it is updated on the map accordingly. Note that in the visualization, the upper score part is omitted, while the remaining parts are used to perform matching.
  • Figure 4: Details of the spatial module for retrieval and linking.
  • Figure 5: Illustration of our three evaluation metrics: Action Accuracy (ActAcc), Numerical Consistency (NumCon), and Spatial Consistency (SpaCon). For ActAcc and NumCon, additional validation models are trained for assessment. For SpaCon, PSNR is calculated by comparing current observations with previous ones at the explored locations.
  • ...and 7 more figures