Exploring Generalizable Pre-training for Real-world Change Detection via Geometric Estimation
Yitao Zhao, Sen Lei, Nanqing Liu, Heng-Chao Li, Turgay Celik, Qing Zhu
TL;DR
This work tackles real-world remote-sensing change detection when bi-temporal images are not pre-aligned, proposing MatchCD to jointly address registration and change detection through a self-supervised, geometry-aware framework. It first learns robust, instance-level representations via zero-shot instance generation and contrastive pre-training, then performs a training-free hierarchical geometric estimation to align large-scale image pairs. The downstream detector fuses pre-trained features with multimodal priors from a foundation model to produce precise change maps, while ensuring valid regions via overlap-boundary cropping. Extensive experiments on WarpCD and WHU-CD demonstrate robust registration and competitive or superior change detection under significant geometric distortions, highlighting practical potential for large-scale earth observation workflows. The approach reduces labeling needs and enables end-to-end processing of unregistered, high-resolution RS imagery with tangible benefits for planning and disaster assessment.
Abstract
As an essential procedure in earth observation system, change detection (CD) aims to reveal the spatial-temporal evolution of the observation regions. A key prerequisite for existing change detection algorithms is aligned geo-references between multi-temporal images by fine-grained registration. However, in the majority of real-world scenarios, a prior manual registration is required between the original images, which significantly increases the complexity of the CD workflow. In this paper, we proposed a self-supervision motivated CD framework with geometric estimation, called "MatchCD". Specifically, the proposed MatchCD framework utilizes the zero-shot capability to optimize the encoder with self-supervised contrastive representation, which is reused in the downstream image registration and change detection to simultaneously handle the bi-temporal unalignment and object change issues. Moreover, unlike the conventional change detection requiring segmenting the full-frame image into small patches, our MatchCD framework can directly process the original large-scale image (e.g., 6K*4K resolutions) with promising performance. The performance in multiple complex scenarios with significant geometric distortion demonstrates the effectiveness of our proposed framework.
