SynCity: Training-Free Generation of 3D Worlds
Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, Andrea Vedaldi
TL;DR
SynCity introduces a training-free pipeline for generating large, navigable 3D worlds by autoregressively constructing a tile grid. It leverages a combination of language prompting, a 2D image generator with isometric framing, and a 3D generator (TRELLIS) to render individual tiles, then blends them in 2D and 3D latent spaces to form a coherent world. The method includes context-aware prompting, rebasing, geometric validation, and multi-view upsampling, with ablations and human studies showing its effectiveness relative to prior approaches. This approach enables scalable, diverse, and detailed 3D environments without retraining foundational models, with potential applications in gaming, simulation, and virtual reality.
Abstract
We address the challenge of generating 3D worlds from textual descriptions. We propose SynCity, a training- and optimization-free approach, which leverages the geometric precision of pre-trained 3D generative models and the artistic versatility of 2D image generators to create large, high-quality 3D spaces. While most 3D generative models are object-centric and cannot generate large-scale worlds, we show how 3D and 2D generators can be combined to generate ever-expanding scenes. Through a tile-based approach, we allow fine-grained control over the layout and the appearance of scenes. The world is generated tile-by-tile, and each new tile is generated within its world-context and then fused with the scene. SynCity generates compelling and immersive scenes that are rich in detail and diversity.
