Table of Contents
Fetching ...

The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation

Yanis Benidir, Nicolas Gonthier, Clement Mallet

TL;DR

This work tackles the scarcity and limited transferability of bi-temporal semantic change detection (SCD) data for very high resolution Earth observation. It introduces HySCDG, a diffusion-based pipeline that semantically guides inpainting to transform real VHR images into paired bi-temporal samples, producing FSC-180k, a large hybrid SCD pretraining dataset built on the FLAIR land-cover map. Pretraining with FSC-180k yields consistent improvements across five target datasets for both binary and semantic changes, outperforming fully synthetic baselines (e.g., SyntheWorld) in most configurations and enabling effective transfer in zero-shot, low-data, and mixed-training scenarios. The approach demonstrates robust cross-domain transfer and offers practical gains for scalable Earth observation change monitoring, with the dataset and tools publicly available for reuse.

Abstract

Bi-temporal change detection at scale based on Very High Resolution (VHR) images is crucial for Earth monitoring. This remains poorly addressed so far: methods either require large volumes of annotated data (semantic case), or are limited to restricted datasets (binary set-ups). Most approaches do not exhibit the versatility required for temporal and spatial adaptation: simplicity in architecture design and pretraining on realistic and comprehensive datasets. Synthetic datasets are the key solution but still fail to handle complex and diverse scenes. In this paper, we present HySCDG a generative pipeline for creating a large hybrid semantic change detection dataset that contains both real VHR images and inpainted ones, along with land cover semantic map at both dates and the change map. Being semantically and spatially guided, HySCDG generates realistic images, leading to a comprehensive and hybrid transfer-proof dataset FSC-180k. We evaluate FSC-180k on five change detection cases (binary and semantic), from zero-shot to mixed and sequential training, and also under low data regime training. Experiments demonstrate that pretraining on our hybrid dataset leads to a significant performance boost, outperforming SyntheWorld, a fully synthetic dataset, in every configuration. All codes, models, and data are available here: https://yb23.github.io/projects/cywd/

The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation

TL;DR

This work tackles the scarcity and limited transferability of bi-temporal semantic change detection (SCD) data for very high resolution Earth observation. It introduces HySCDG, a diffusion-based pipeline that semantically guides inpainting to transform real VHR images into paired bi-temporal samples, producing FSC-180k, a large hybrid SCD pretraining dataset built on the FLAIR land-cover map. Pretraining with FSC-180k yields consistent improvements across five target datasets for both binary and semantic changes, outperforming fully synthetic baselines (e.g., SyntheWorld) in most configurations and enabling effective transfer in zero-shot, low-data, and mixed-training scenarios. The approach demonstrates robust cross-domain transfer and offers practical gains for scalable Earth observation change monitoring, with the dataset and tools publicly available for reuse.

Abstract

Bi-temporal change detection at scale based on Very High Resolution (VHR) images is crucial for Earth monitoring. This remains poorly addressed so far: methods either require large volumes of annotated data (semantic case), or are limited to restricted datasets (binary set-ups). Most approaches do not exhibit the versatility required for temporal and spatial adaptation: simplicity in architecture design and pretraining on realistic and comprehensive datasets. Synthetic datasets are the key solution but still fail to handle complex and diverse scenes. In this paper, we present HySCDG a generative pipeline for creating a large hybrid semantic change detection dataset that contains both real VHR images and inpainted ones, along with land cover semantic map at both dates and the change map. Being semantically and spatially guided, HySCDG generates realistic images, leading to a comprehensive and hybrid transfer-proof dataset FSC-180k. We evaluate FSC-180k on five change detection cases (binary and semantic), from zero-shot to mixed and sequential training, and also under low data regime training. Experiments demonstrate that pretraining on our hybrid dataset leads to a significant performance boost, outperforming SyntheWorld, a fully synthetic dataset, in every configuration. All codes, models, and data are available here: https://yb23.github.io/projects/cywd/

Paper Structure

This paper contains 53 sections, 2 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Efficient and scalable change detection requires a comprehensive training dataset that does not exist today. Using a single-temporal dataset (image+semantic map), we propose HySCDG that generates a novel bi-temporal hybrid dataset FSC-180k. This enables multiple transfer learning possibilities on either binary and semantic change detection tasks. / : Transfer with/without fine-tuning our model, respectively.
  • Figure 2: HySCDG pipeline. From a single-temporal dataset composed of one VHR image $I_1$, a semantic map $M_1$, and some openly available labeled instances, we generate a new VHR image $I_2$, a new map $M_2$ and subsequently a change map $C$. This results in the FSC-180k hybrid dataset. The two pivotal novelties consists in: (i) Adapting and fine-tuning a Stable-Diffusion Model from image inpainting and (ii) exploiting open geospatial data for inpainting prompt control and semantically guiding the objects to be modified. The combination of both solutions ensures a diverse at-scale VHR multi-class change detection dataset.
  • Figure 3: Sequential training in binary change detection. After pretraining our Dual U-Net on either nothing, SyntheWorld or FSC-180k, we finetune and test it on each of the 4 target datasets and represent the binary IoU.
  • Figure 4: Mixed training (Binary and Semantic). Training on a blend of target and synthetic/hybrid (SyntheWorld or FSC-180k) train sets, containing a ratio of x% samples from the target (including repetitions). Testing is performed on the target test set. 100% corresponds to fine-tuning exclusively on target dataset (without pretraining).
  • Figure 5: Low data regime (Semantic). Fine-tuning on a limited part of the target train set (either 1%, 10% or 30% of randomly sampled examples) and evaluating on the whole test set. Models are initially pretrained (SyntheWorld or FSC-180k) or not (Baseline). Metrics are averaged on 10 runs for SECOND and HiUCD-mini.
  • ...and 6 more figures