Table of Contents
Fetching ...

Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery

Yijie Kang, Xinliang Wang, Zhenyu Wu, Yifeng Shi, Hailong Zhu

TL;DR

Sat2RealCity addresses the lack of large-scale 3D city assets and insufficient realism in satellite-guided city generation by shifting to building-entity synthesis guided by OSM priors and appearance control. It integrates geometry-aware priors, dual-path appearance modeling, and an MLLM-powered semantic pipeline to produce regionally coherent, photorealistic 3D urban scenes directly from satellite imagery. Through a dedicated 3D Building Dataset and TRELLIS-based latent generation with SS/SLAT flows, the method achieves superior geometric fidelity and appearance realism, outperforming baselines in both structure and style and demonstrating strong cross-region generalization. The work provides a practical, scalable approach for real-world aligned 3D urban content creation with potential applications in digital twins and urban simulations.

Abstract

Recent advances in generative modeling have substantially enhanced 3D urban generation, enabling applications in digital twins, virtual cities, and large-scale simulations. However, existing methods face two key challenges: (1) the need for large-scale 3D city assets for supervised training, which are difficult and costly to obtain, and (2) reliance on semantic or height maps, which are used exclusively for generating buildings in virtual worlds and lack connection to real-world appearance, limiting the realism and generalizability of generated cities. To address these limitations, we propose Sat2RealCity, a geometry-aware and appearance-controllable framework for 3D urban generation from real-world satellite imagery. Unlike previous city-level generation methods, Sat2RealCity builds generation upon individual building entities, enabling the use of rich priors and pretrained knowledge from 3D object generation while substantially reducing dependence on large-scale 3D city assets. Specifically, (1) we introduce the OSM-based spatial priors strategy to achieve interpretable geometric generation from spatial topology to building instances; (2) we design an appearance-guided controllable modeling mechanism for fine-grained appearance realism and style control; and (3) we construct an MLLM-powered semantic-guided generation pipeline, bridging semantic interpretation and geometric reconstruction. Extensive quantitative and qualitative experiments demonstrate that Sat2RealCity significantly surpasses existing baselines in structural consistency and appearance realism, establishing a strong foundation for real-world aligned 3D urban content creation. The code will be released soon.

Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery

TL;DR

Sat2RealCity addresses the lack of large-scale 3D city assets and insufficient realism in satellite-guided city generation by shifting to building-entity synthesis guided by OSM priors and appearance control. It integrates geometry-aware priors, dual-path appearance modeling, and an MLLM-powered semantic pipeline to produce regionally coherent, photorealistic 3D urban scenes directly from satellite imagery. Through a dedicated 3D Building Dataset and TRELLIS-based latent generation with SS/SLAT flows, the method achieves superior geometric fidelity and appearance realism, outperforming baselines in both structure and style and demonstrating strong cross-region generalization. The work provides a practical, scalable approach for real-world aligned 3D urban content creation with potential applications in digital twins and urban simulations.

Abstract

Recent advances in generative modeling have substantially enhanced 3D urban generation, enabling applications in digital twins, virtual cities, and large-scale simulations. However, existing methods face two key challenges: (1) the need for large-scale 3D city assets for supervised training, which are difficult and costly to obtain, and (2) reliance on semantic or height maps, which are used exclusively for generating buildings in virtual worlds and lack connection to real-world appearance, limiting the realism and generalizability of generated cities. To address these limitations, we propose Sat2RealCity, a geometry-aware and appearance-controllable framework for 3D urban generation from real-world satellite imagery. Unlike previous city-level generation methods, Sat2RealCity builds generation upon individual building entities, enabling the use of rich priors and pretrained knowledge from 3D object generation while substantially reducing dependence on large-scale 3D city assets. Specifically, (1) we introduce the OSM-based spatial priors strategy to achieve interpretable geometric generation from spatial topology to building instances; (2) we design an appearance-guided controllable modeling mechanism for fine-grained appearance realism and style control; and (3) we construct an MLLM-powered semantic-guided generation pipeline, bridging semantic interpretation and geometric reconstruction. Extensive quantitative and qualitative experiments demonstrate that Sat2RealCity significantly surpasses existing baselines in structural consistency and appearance realism, establishing a strong foundation for real-world aligned 3D urban content creation. The code will be released soon.

Paper Structure

This paper contains 21 sections, 3 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: We present Sat2RealCity, a novel framework for generating high-fidelity 3D city models with detailed geometry and appearance from real-world satellite imagery.
  • Figure 2: The overview of Sat2RealCity. (a) The OSM-based Structural Priors Strategy converts OSM data into a fused geometric prior $Z"_{\mathcal{O}}$. (b) The Appearance-guided Modeling Mechanism uses $Z"_{\mathcal{O}}$, the top view feature $c_t$, and a frontal appearance image feature $c_f$ to generate the 3D building. (c) The MLLM-powered Generation Pipeline provides geometric priors and the frontal-view image for modules (a) and (b). Finally, all generated buildings are assembled into the urban scene using their original OSM coordinates.
  • Figure 3: Examples of the 3D Building Dataset.
  • Figure 4: Qualitative visualization of 3D urban generation.
  • Figure 5: Qualitative visualization of geometric fidelity.
  • ...and 3 more figures