SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery
Jian Song, Hongruixuan Chen, Weihao Xuan, Junshi Xia, Naoto Yokoya
TL;DR
This work tackles the challenge of global 3D semantic understanding from single-view high-resolution remote sensing imagery by introducing SynRS3D, the largest synthetic RS 3D dataset (69,667 images with six city styles, eight land-cover classes, height maps, and change masks), and RS3DAda, a multi-task unsupervised domain adaptation method designed for synthetic-to-real transfer in land-cover mapping and height estimation. Through a carefully crafted acquisition pipeline, diverse statistical grounding, and a hybrid self-training framework that leverages land-cover and height cues along with ground-guided refinements, the authors demonstrate that synthetic data can meaningfully bolster real-world RS tasks, especially when real data is scarce. RS3DAda achieves superior performance over existing UDA baselines, stabilizes training on synthetic data, and enables disaster mapping capabilities using height-difference analyses, establishing SynRS3D as a practical benchmark for future synthetic-to-real RS research. While gaps to real data performance remain, this work provides a concrete pathway for scalable global RS understanding from monocular imagery with strong implications for urban planning, environmental monitoring, and disaster response.
Abstract
Global semantic 3D understanding from single-view high-resolution remote sensing (RS) imagery is crucial for Earth Observation (EO). However, this task faces significant challenges due to the high costs of annotations and data collection, as well as geographically restricted data availability. To address these challenges, synthetic data offer a promising solution by being easily accessible and thus enabling the provision of large and diverse datasets. We develop a specialized synthetic data generation pipeline for EO and introduce SynRS3D, the largest synthetic RS 3D dataset. SynRS3D comprises 69,667 high-resolution optical images that cover six different city styles worldwide and feature eight land cover types, precise height information, and building change masks. To further enhance its utility, we develop a novel multi-task unsupervised domain adaptation (UDA) method, RS3DAda, coupled with our synthetic dataset, which facilitates the RS-specific transition from synthetic to real scenarios for land cover mapping and height estimation tasks, ultimately enabling global monocular 3D semantic understanding based on synthetic data. Extensive experiments on various real-world datasets demonstrate the adaptability and effectiveness of our synthetic dataset and proposed RS3DAda method. SynRS3D and related codes will be available.
