Model as a Game: On Numerical and Spatial Consistency for Generative Games
Jingye Chen, Yuzhong Zhao, Yupan Huang, Lei Cui, Li Dong, Tengchao Lv, Qifeng Chen, Furu Wei
TL;DR
This work addresses the challenge of maintaining numerical and spatial consistency in generative games by introducing MaaG, a framework that augments a diffusion-based game generator with two specialized modules. The Numerical Module uses a trainable LogicNet and an external numeric record to ensure event-triggered value changes align with gameplay, while the Spatial Module maintains an external map to preserve spatial continuity across revisits, using retrieval and sliding-window linking. Empirical results across Traveler, Pong, and Pac-Man show substantial gains in numerical consistency (NumCon) and spatial consistency (SpaCon) with modest inference overhead, along with qualitative improvements in map coherence and score rendering accuracy. The approach demonstrates practical potential for consistency-aware generative game creation, offering map customization and reliable digit-based score rendering, and sets the stage for extending to more complex 2D/3D environments.
Abstract
Recent advances in generative models have significantly impacted game generation. However, despite producing high-quality graphics and adequately receiving player input, existing models often fail to maintain fundamental game properties such as numerical and spatial consistency. Numerical consistency ensures gameplay mechanics correctly reflect score changes and other quantitative elements, while spatial consistency prevents jarring scene transitions, providing seamless player experiences. In this paper, we revisit the paradigm of generative games to explore what truly constitutes a Model as a Game (MaaG) with a well-developed mechanism. We begin with an empirical study on ``Traveler'', a 2D game created by an LLM featuring minimalist rules yet challenging generative models in maintaining consistency. Based on the DiT architecture, we design two specialized modules: (1) a numerical module that integrates a LogicNet to determine event triggers, with calculations processed externally as conditions for image generation; and (2) a spatial module that maintains a map of explored areas, retrieving location-specific information during generation and linking new observations to ensure continuity. Experiments across three games demonstrate that our integrated modules significantly enhance performance on consistency metrics compared to baselines, while incurring minimal time overhead during inference.
