Table of Contents
Fetching ...

InfiniCity: Infinite-Scale City Synthesis

Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

TL;DR

InfiniCity tackles the challenge of infinite-scale 3D city synthesis by decoupling the problem into 2D map generation, 3D voxel completion, and neural rendering. The system uses InfinityGAN-based infinite-pixel 2D map synthesis, octree-based voxel completion to build a watertight 3D world, and GANcraft-style neural rendering to texture and render views. Evaluations on HoliCity demonstrate improved structural plausibility and cross-view consistency, with ablations confirming the value of the contrastive patch discriminator and the voxel-based pipeline. It enables interactive region-based resampling for fast editing and provides a scalable path toward navigable, unlimited-size virtual cities.

Abstract

Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises. InfiniCity decomposes the seemingly impractical task into three feasible modules, taking advantage of both 2D and 3D data. First, an infinite-pixel image synthesis module generates arbitrary-scale 2D maps from the bird's-eye view. Next, an octree-based voxel completion module lifts the generated 2D map to 3D octrees. Finally, a voxel-based neural rendering module texturizes the voxels and renders 2D images. InfiniCity can thus synthesize arbitrary-scale and traversable 3D city environments, and allow flexible and interactive editing from users. We quantitatively and qualitatively demonstrate the efficacy of the proposed framework. Project page: https://hubert0527.github.io/infinicity/

InfiniCity: Infinite-Scale City Synthesis

TL;DR

InfiniCity tackles the challenge of infinite-scale 3D city synthesis by decoupling the problem into 2D map generation, 3D voxel completion, and neural rendering. The system uses InfinityGAN-based infinite-pixel 2D map synthesis, octree-based voxel completion to build a watertight 3D world, and GANcraft-style neural rendering to texture and render views. Evaluations on HoliCity demonstrate improved structural plausibility and cross-view consistency, with ablations confirming the value of the contrastive patch discriminator and the voxel-based pipeline. It enables interactive region-based resampling for fast editing and provides a scalable path toward navigable, unlimited-size virtual cities.

Abstract

Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises. InfiniCity decomposes the seemingly impractical task into three feasible modules, taking advantage of both 2D and 3D data. First, an infinite-pixel image synthesis module generates arbitrary-scale 2D maps from the bird's-eye view. Next, an octree-based voxel completion module lifts the generated 2D map to 3D octrees. Finally, a voxel-based neural rendering module texturizes the voxels and renders 2D images. InfiniCity can thus synthesize arbitrary-scale and traversable 3D city environments, and allow flexible and interactive editing from users. We quantitatively and qualitatively demonstrate the efficacy of the proposed framework. Project page: https://hubert0527.github.io/infinicity/
Paper Structure (20 sections, 1 equation, 7 figures, 3 tables)

This paper contains 20 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: We propose InfiniCity, a three-stage synthesis framework toward infinite-scale city scene synthesis. Starting from the bottom to the top, we synthesize multi-modality infinite-pixel satellite images, perform octree-based voxel completion to create a watertight voxel world, then finally texturize with voxel neural rendering. In the middle figure, we mark the camera locations (in red and orange) used to render the views in the top figures.
  • Figure 2: Overview. InfiniCity consists of three major modules. The Infinite-pixel satellite image synthesis stage is trained on image tuples (category, depth, and normal maps) derived from a bird's-eye view scan of the 3D environment, and is able to synthesize arbitrary-scale satellite maps during inference. The 3D octree-based voxel completion stage is trained on pairs of surface-scanned and completed octrees. During inference, it takes the surface voxels lifted from the satellite images as inputs and produces the watertight voxel world. Finally, the voxel-based neural rendering stage performs ray-sampling to retrieve features from the voxel world, then renders the final image with a neural renderer. The neural renderer is trained with both real images and pseudo-ground-truths synthesized by a pretrained SPADE generator. With these modules, InfiniCity can synthesize an arbitrary-scale and traversable 3D city environment from noises.
  • Figure 3: Interactive resampling. Our GUI allows users to select a region of interest and resample the local variables with efficient on-demand inference operated only on the neighbor regions. Notice that an undesired "the road running into the lake" results is alleviated.
  • Figure 4: Synthesized satellite maps. We train InfinityGAN with contrastive discriminator in multiple data modalities (category, depth, and normal). The demonstrated images are 1024$\times$1024 pixels cropped from the infinite-pixel images.
  • Figure 5: Octree-based voxel completion. High-quality and high-diversity voxels completed from synthetic satellite images. For both sample groups, we show synthesized satellite images, lifted surface voxels, then 3D-completed voxels. The samples are 64$^3$ voxels.
  • ...and 2 more figures