FUSU: A Multi-temporal-source Land Use Change Segmentation Dataset for Fine-grained Urban Semantic Understanding
Shuai Yuan, Guancong Lin, Lixian Zhang, Runmin Dong, Jinxiao Zhang, Shuang Chen, Juepeng Zheng, Jie Wang, Haohuan Fu
TL;DR
This work tackles the lack of fine-grained urban land-use change data by introducing FUSU, a multi-temporal, multi-source dataset with 17 land-use classes and over $3\times10^{10}$ annotation pixels across $847\ \mathrm{km^2}$ in five Chinese cities, using bi-temporal high-resolution imagery ($0.2$–$0.5$ m) and monthly Sentinel-1/2 time series. To leverage this rich data, the authors propose FUSU-Net, a unified time-series architecture that jointly performs segmentation and change detection through a dual-branch design that fuses time-series Sentinel features with bi-temporal high-resolution features. Comprehensive experiments show that FUSU-Net achieves superior segmentation and change-detection performance, highlighting the benefits of multi-source, multi-temporal context for fine-grained urban understanding. The dataset and baseline model aim to advance multi-source, multi-temporal urban monitoring with practical implications for planning and environmental assessment, while also outlining limitations and directions for future multi-modal fusion and domain adaptation.
Abstract
Fine urban change segmentation using multi-temporal remote sensing images is essential for understanding human-environment interactions in urban areas. Although there have been advances in high-quality land cover datasets that reveal the physical features of urban landscapes, the lack of fine-grained land use datasets hinders a deeper understanding of how human activities are distributed across the landscape and the impact of these activities on the environment, thus constraining proper technique development. To address this, we introduce FUSU, the first fine-grained land use change segmentation dataset for Fine-grained Urban Semantic Understanding. FUSU features the most detailed land use classification system to date, with 17 classes and 30 billion pixels of annotations. It includes bi-temporal high-resolution satellite images with 0.2-0.5 m ground sample distance and monthly optical and radar satellite time series, covering 847 km^2 across five urban areas in the southern and northern of China with different geographical features. The fine-grained land use pixel-wise annotations and high spatial-temporal resolution data provide a robust foundation for developing proper deep learning models to provide contextual insights on human activities and urbanization. To fully leverage FUSU, we propose a unified time-series architecture for both change detection and segmentation. We benchmark FUSU on various methods for several tasks. Dataset and code are available at: https://github.com/yuanshuai0914/FUSU.
