Table of Contents
Fetching ...

Exploring Generalizable Pre-training for Real-world Change Detection via Geometric Estimation

Yitao Zhao, Sen Lei, Nanqing Liu, Heng-Chao Li, Turgay Celik, Qing Zhu

TL;DR

This work tackles real-world remote-sensing change detection when bi-temporal images are not pre-aligned, proposing MatchCD to jointly address registration and change detection through a self-supervised, geometry-aware framework. It first learns robust, instance-level representations via zero-shot instance generation and contrastive pre-training, then performs a training-free hierarchical geometric estimation to align large-scale image pairs. The downstream detector fuses pre-trained features with multimodal priors from a foundation model to produce precise change maps, while ensuring valid regions via overlap-boundary cropping. Extensive experiments on WarpCD and WHU-CD demonstrate robust registration and competitive or superior change detection under significant geometric distortions, highlighting practical potential for large-scale earth observation workflows. The approach reduces labeling needs and enables end-to-end processing of unregistered, high-resolution RS imagery with tangible benefits for planning and disaster assessment.

Abstract

As an essential procedure in earth observation system, change detection (CD) aims to reveal the spatial-temporal evolution of the observation regions. A key prerequisite for existing change detection algorithms is aligned geo-references between multi-temporal images by fine-grained registration. However, in the majority of real-world scenarios, a prior manual registration is required between the original images, which significantly increases the complexity of the CD workflow. In this paper, we proposed a self-supervision motivated CD framework with geometric estimation, called "MatchCD". Specifically, the proposed MatchCD framework utilizes the zero-shot capability to optimize the encoder with self-supervised contrastive representation, which is reused in the downstream image registration and change detection to simultaneously handle the bi-temporal unalignment and object change issues. Moreover, unlike the conventional change detection requiring segmenting the full-frame image into small patches, our MatchCD framework can directly process the original large-scale image (e.g., 6K*4K resolutions) with promising performance. The performance in multiple complex scenarios with significant geometric distortion demonstrates the effectiveness of our proposed framework.

Exploring Generalizable Pre-training for Real-world Change Detection via Geometric Estimation

TL;DR

This work tackles real-world remote-sensing change detection when bi-temporal images are not pre-aligned, proposing MatchCD to jointly address registration and change detection through a self-supervised, geometry-aware framework. It first learns robust, instance-level representations via zero-shot instance generation and contrastive pre-training, then performs a training-free hierarchical geometric estimation to align large-scale image pairs. The downstream detector fuses pre-trained features with multimodal priors from a foundation model to produce precise change maps, while ensuring valid regions via overlap-boundary cropping. Extensive experiments on WarpCD and WHU-CD demonstrate robust registration and competitive or superior change detection under significant geometric distortions, highlighting practical potential for large-scale earth observation workflows. The approach reduces labeling needs and enables end-to-end processing of unregistered, high-resolution RS imagery with tangible benefits for planning and disaster assessment.

Abstract

As an essential procedure in earth observation system, change detection (CD) aims to reveal the spatial-temporal evolution of the observation regions. A key prerequisite for existing change detection algorithms is aligned geo-references between multi-temporal images by fine-grained registration. However, in the majority of real-world scenarios, a prior manual registration is required between the original images, which significantly increases the complexity of the CD workflow. In this paper, we proposed a self-supervision motivated CD framework with geometric estimation, called "MatchCD". Specifically, the proposed MatchCD framework utilizes the zero-shot capability to optimize the encoder with self-supervised contrastive representation, which is reused in the downstream image registration and change detection to simultaneously handle the bi-temporal unalignment and object change issues. Moreover, unlike the conventional change detection requiring segmenting the full-frame image into small patches, our MatchCD framework can directly process the original large-scale image (e.g., 6K*4K resolutions) with promising performance. The performance in multiple complex scenarios with significant geometric distortion demonstrates the effectiveness of our proposed framework.

Paper Structure

This paper contains 35 sections, 17 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Conventional CD methods vs. Our MatchCD. Illustration of the difference between conventional CD methods and our MatchCD in unregistered scenario.
  • Figure 2: Workflow of the proposed MatchCD framework. The whole MatchCD framework contains three main procedures: (a) Instance Contrastive Pre-training, (b) Hierarchical Geometric Estimation, (c) Prior Knowledge-driven CD.
  • Figure 3: Illustration of the hierarchical geometric estimation for bi-temporal real-world scenario with significant distortion.
  • Figure 4: Illustration of downstream change detection workflow.
  • Figure 5: Illustration of the dataset utilized for MatchCD pre-training. The samples are selected from the AID and UCMerced dataset according to the class information related to building objects.
  • ...and 9 more figures