Table of Contents
Fetching ...

Dsfer-Net: A Deep Supervision and Feature Retrieval Network for Bitemporal Change Detection Using Modern Hopfield Networks

Shizhen Chang, Michael Kopp, Pedram Ghamisi, Bo Du

TL;DR

This work tackles bitemporal change detection in high-resolution remote sensing by introducing Dsfer-Net, a deep supervision and feature retrieval network that leverages modern Hopfield networks to retrieve and aggregate differential features across time. The methodology combines a Siamese VGG-16 backbone, Deeply Supervised Feature Retrieval (DSFR) modules, and a Comprehensive Fusion (CF) decoder to fuse multi-scale features with retrieved change signals, trained via a hybrid loss that includes weighted BCE and Dice terms. Empirical evaluation on LEVIR-CD, WHU-CD, and CDD demonstrates superior accuracy (F1, IoU, OA, and Precision) over eight state-of-the-art methods, with ablation studies confirming the beneficial contributions of DSFR and CF. The approach also provides explainable evidence of semantic understanding through visualizations of Hopfield-retrieved features, highlighting real temporal changes while indicating some limitations under strong pseudo-changes. Overall, Dsfer-Net offers a scalable, end-to-end framework that improves contour preservation and reduces false alarms in challenging remote sensing CD tasks, with potential for further refinements in handling pseudo-changes and training dynamics.

Abstract

Change detection, an essential application for high-resolution remote sensing images, aims to monitor and analyze changes in the land surface over time. Due to the rapid increase in the quantity of high-resolution remote sensing data and the complexity of texture features, several quantitative deep learning-based methods have been proposed. These methods outperform traditional change detection methods by extracting deep features and combining spatial-temporal information. However, reasonable explanations for how deep features improve detection performance are still lacking. In our investigations, we found that modern Hopfield network layers significantly enhance semantic understanding. In this paper, we propose a Deep Supervision and FEature Retrieval network (Dsfer-Net) for bitemporal change detection. Specifically, the highly representative deep features of bitemporal images are jointly extracted through a fully convolutional Siamese network. Based on the sequential geographical information of the bitemporal images, we designed a feature retrieval module to extract difference features and leverage discriminative information in a deeply supervised manner. Additionally, we observed that the deeply supervised feature retrieval module provides explainable evidence of the semantic understanding of the proposed network in its deep layers. Finally, our end-to-end network establishes a novel framework by aggregating retrieved features and feature pairs from different layers. Experiments conducted on three public datasets (LEVIR-CD, WHU-CD, and CDD) confirm the superiority of the proposed Dsfer-Net over other state-of-the-art methods.

Dsfer-Net: A Deep Supervision and Feature Retrieval Network for Bitemporal Change Detection Using Modern Hopfield Networks

TL;DR

This work tackles bitemporal change detection in high-resolution remote sensing by introducing Dsfer-Net, a deep supervision and feature retrieval network that leverages modern Hopfield networks to retrieve and aggregate differential features across time. The methodology combines a Siamese VGG-16 backbone, Deeply Supervised Feature Retrieval (DSFR) modules, and a Comprehensive Fusion (CF) decoder to fuse multi-scale features with retrieved change signals, trained via a hybrid loss that includes weighted BCE and Dice terms. Empirical evaluation on LEVIR-CD, WHU-CD, and CDD demonstrates superior accuracy (F1, IoU, OA, and Precision) over eight state-of-the-art methods, with ablation studies confirming the beneficial contributions of DSFR and CF. The approach also provides explainable evidence of semantic understanding through visualizations of Hopfield-retrieved features, highlighting real temporal changes while indicating some limitations under strong pseudo-changes. Overall, Dsfer-Net offers a scalable, end-to-end framework that improves contour preservation and reduces false alarms in challenging remote sensing CD tasks, with potential for further refinements in handling pseudo-changes and training dynamics.

Abstract

Change detection, an essential application for high-resolution remote sensing images, aims to monitor and analyze changes in the land surface over time. Due to the rapid increase in the quantity of high-resolution remote sensing data and the complexity of texture features, several quantitative deep learning-based methods have been proposed. These methods outperform traditional change detection methods by extracting deep features and combining spatial-temporal information. However, reasonable explanations for how deep features improve detection performance are still lacking. In our investigations, we found that modern Hopfield network layers significantly enhance semantic understanding. In this paper, we propose a Deep Supervision and FEature Retrieval network (Dsfer-Net) for bitemporal change detection. Specifically, the highly representative deep features of bitemporal images are jointly extracted through a fully convolutional Siamese network. Based on the sequential geographical information of the bitemporal images, we designed a feature retrieval module to extract difference features and leverage discriminative information in a deeply supervised manner. Additionally, we observed that the deeply supervised feature retrieval module provides explainable evidence of the semantic understanding of the proposed network in its deep layers. Finally, our end-to-end network establishes a novel framework by aggregating retrieved features and feature pairs from different layers. Experiments conducted on three public datasets (LEVIR-CD, WHU-CD, and CDD) confirm the superiority of the proposed Dsfer-Net over other state-of-the-art methods.
Paper Structure (18 sections, 8 equations, 11 figures, 5 tables)

This paper contains 18 sections, 8 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Basic architectures of deep learning-based change detection methods. (a) Single-branch architecture. (b) and (c) Double-branch architectures.
  • Figure 2: Examples of the difference features from shallow to deep layers in a Siamese change detection network using VGG-16 as the backbone. (a) Images from $t_1$. (b) Images from $t_2$. (c) Features from the first stage. (d) Features from the second stage. (e) Features from the third stage. (f) Features from the fourth stage. (g) Features from the fifth stage. (h) Ground-truths.
  • Figure 3: The overall architecture of the Dsfer-Net. The feature extractor is a Siamese architecture with shared weights. VGG-16 before pool5 is utilized as the backbone. DSFR modules are used for feature aggregation and retrieval in deep layers. CF blocks comprehensively fuse the multi-scale features.
  • Figure 4: Deeply supervised feature retrieval (DSFR) module. $(F_1^i,F_2^i)$ represents the $i$-th paired feature maps extracted by the feature extractor. For the proposed network, the DSFR module is conducted on the fourth and fifth feature pairs, where $i= 4, 5$.
  • Figure 5: Comprehensive Fusion (CF) strategy. $(F_1^i,F_2^i)$ represents the $i$-th paired feature maps extracted by the feature extractor. If $i= 4, 5$, the concatenate features are multiplied with the retrieved feature $F_R^i$. $\Tilde{F}^{i+1}$ represents the upsampled feature obtained from the deeper CF block.
  • ...and 6 more figures