Table of Contents
Fetching ...

ChangeDINO: DINOv3-Driven Building Change Detection in Optical Remote Sensing Imagery

Ching-Heng Cheng, Chih-Chung Hsu

TL;DR

ChangeDINO addresses robustness challenges in optical RSCD under illumination changes, off-nadir views, and limited labels by transferring semantic knowledge from a frozen DINOv3 into a Siamese encoder and by using a differential transformer decoder to reason over cross-temporal context. The method fuses DINOv3 features with a lightweight backbone to form semantic multi-scale pyramids, derives multi-scale change priors via absolute differences, and refines predictions with a learnable morphology module in an end-to-end training setup. Key contributions include (i) DINOv3-based semantic priors without fine-tuning, (ii) a Spatial-Spectral Differential Transformer that suppresses distractors while sharpening true changes, and (iii) a differentiable morphology head for boundary fidelity. Empirical results on four public RSCD benchmarks show consistent improvements in IoU and F1 over state-of-the-art methods, demonstrating robustness across diverse scenes and imaging conditions; the authors provide public code.

Abstract

Remote sensing change detection (RSCD) aims to identify surface changes from co-registered bi-temporal images. However, many deep learning-based RSCD methods rely solely on change-map annotations and underuse the semantic information in non-changing regions, which limits robustness under illumination variation, off-nadir views, and scarce labels. This article introduces ChangeDINO, an end-to-end multiscale Siamese framework for optical building change detection. The model fuses a lightweight backbone stream with features transferred from a frozen DINOv3, yielding semantic- and context-rich pyramids even on small datasets. A spatial-spectral differential transformer decoder then exploits multi-scale absolute differences as change priors to highlight true building changes and suppress irrelevant responses. Finally, a learnable morphology module refines the upsampled logits to recover clean boundaries. Experiments on four public benchmarks show that ChangeDINO consistently outperforms recent state-of-the-art methods in IoU and F1, and ablation studies confirm the effectiveness of each component. The source code is available at https://github.com/chingheng0808/ChangeDINO.

ChangeDINO: DINOv3-Driven Building Change Detection in Optical Remote Sensing Imagery

TL;DR

ChangeDINO addresses robustness challenges in optical RSCD under illumination changes, off-nadir views, and limited labels by transferring semantic knowledge from a frozen DINOv3 into a Siamese encoder and by using a differential transformer decoder to reason over cross-temporal context. The method fuses DINOv3 features with a lightweight backbone to form semantic multi-scale pyramids, derives multi-scale change priors via absolute differences, and refines predictions with a learnable morphology module in an end-to-end training setup. Key contributions include (i) DINOv3-based semantic priors without fine-tuning, (ii) a Spatial-Spectral Differential Transformer that suppresses distractors while sharpening true changes, and (iii) a differentiable morphology head for boundary fidelity. Empirical results on four public RSCD benchmarks show consistent improvements in IoU and F1 over state-of-the-art methods, demonstrating robustness across diverse scenes and imaging conditions; the authors provide public code.

Abstract

Remote sensing change detection (RSCD) aims to identify surface changes from co-registered bi-temporal images. However, many deep learning-based RSCD methods rely solely on change-map annotations and underuse the semantic information in non-changing regions, which limits robustness under illumination variation, off-nadir views, and scarce labels. This article introduces ChangeDINO, an end-to-end multiscale Siamese framework for optical building change detection. The model fuses a lightweight backbone stream with features transferred from a frozen DINOv3, yielding semantic- and context-rich pyramids even on small datasets. A spatial-spectral differential transformer decoder then exploits multi-scale absolute differences as change priors to highlight true building changes and suppress irrelevant responses. Finally, a learnable morphology module refines the upsampled logits to recover clean boundaries. Experiments on four public benchmarks show that ChangeDINO consistently outperforms recent state-of-the-art methods in IoU and F1, and ablation studies confirm the effectiveness of each component. The source code is available at https://github.com/chingheng0808/ChangeDINO.

Paper Structure

This paper contains 19 sections, 17 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overall architecture of ChangeDINO. The model adopts a classic multi-scale encoder–decoder and is trained end-to-end for optical building change detection. Please zoom-in for details.
  • Figure 2: Lightweight feature adapter aligning DINOv3 features with our backbone.
  • Figure 3: Spatial–spectral differential transformer ($\mathrm{S}^2\mathrm{DT}$) block. Incorporates a differential transformer into overlapped-window spatial self-attention and pairs it with channel-wise self-attention to refine feature intensities.
  • Figure 4: Learnable morphological module (LMM). Classical opening and closing with learnable structuring elements further refine the prediction.
  • Figure 5: Qualitative experimental results on LEVIR-CD and WHU-CD.
  • ...and 4 more figures