Table of Contents
Fetching ...

Tri-path DINO: Feature Complementary Learning for Remote Sensing Multi-Class Change Detection

Kai Zheng, Hang-Cheng Dong, Zhenkai Wu, Fupeng Wei, Wei Zhang

TL;DR

The Tripath DINO architecture is proposed, which adopts a three path complementary feature learning strategy to facilitate the rapid adaptation of pre trained foundation models to complex vertical domains, offering a robust and interpretable solution for advanced change detection tasks.

Abstract

In remote sensing imagery, multi class change detection (MCD) is crucial for fine grained monitoring, yet it has long been constrained by complex scene variations and the scarcity of detailed annotations. To address this, we propose the Tripath DINO architecture, which adopts a three path complementary feature learning strategy to facilitate the rapid adaptation of pre trained foundation models to complex vertical domains. Specifically, we employ the DINOv3 pre trained model as the backbone feature extraction network to learn coarse grained features. An auxiliary path also adopts a siamese structure, progressively aggregating intermediate features from the siamese encoder to enhance the learning of fine grained features. Finally, a multi scale attention mechanism is introduced to augment the decoder network, where parallel convolutions adaptively capture and enhance contextual information under different receptive fields. The proposed method achieves optimal performance on the MCD task on both the Gaza facility damage assessment dataset (Gaza change) and the classic SECOND dataset. GradCAM visualizations further confirm that the main and auxiliary paths naturally focus on coarse grained semantic changes and fine grained structural details, respectively. This synergistic complementarity provides a robust and interpretable solution for advanced change detection tasks, offering a basis for rapid and accurate damage assessment.

Tri-path DINO: Feature Complementary Learning for Remote Sensing Multi-Class Change Detection

TL;DR

The Tripath DINO architecture is proposed, which adopts a three path complementary feature learning strategy to facilitate the rapid adaptation of pre trained foundation models to complex vertical domains, offering a robust and interpretable solution for advanced change detection tasks.

Abstract

In remote sensing imagery, multi class change detection (MCD) is crucial for fine grained monitoring, yet it has long been constrained by complex scene variations and the scarcity of detailed annotations. To address this, we propose the Tripath DINO architecture, which adopts a three path complementary feature learning strategy to facilitate the rapid adaptation of pre trained foundation models to complex vertical domains. Specifically, we employ the DINOv3 pre trained model as the backbone feature extraction network to learn coarse grained features. An auxiliary path also adopts a siamese structure, progressively aggregating intermediate features from the siamese encoder to enhance the learning of fine grained features. Finally, a multi scale attention mechanism is introduced to augment the decoder network, where parallel convolutions adaptively capture and enhance contextual information under different receptive fields. The proposed method achieves optimal performance on the MCD task on both the Gaza facility damage assessment dataset (Gaza change) and the classic SECOND dataset. GradCAM visualizations further confirm that the main and auxiliary paths naturally focus on coarse grained semantic changes and fine grained structural details, respectively. This synergistic complementarity provides a robust and interpretable solution for advanced change detection tasks, offering a basis for rapid and accurate damage assessment.
Paper Structure (18 sections, 21 equations, 4 figures, 4 tables)

This paper contains 18 sections, 21 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overall architecture of the proposed Tri-path DINO.
  • Figure 2: Example results on Gaza-change dataset.
  • Figure 3: Qualitative comparison on the Gaza-Change dataset, with color coding: true positives (white), true negatives (black), false positives (red), and false negatives (green).
  • Figure 4: Heatmaps based on the GradCAM visualization, Path represents the features learned by the third pathway, while Backbone represents the features learned by the backbone siamese network.