Dsfer-Net: A Deep Supervision and Feature Retrieval Network for Bitemporal Change Detection Using Modern Hopfield Networks
Shizhen Chang, Michael Kopp, Pedram Ghamisi, Bo Du
TL;DR
This work tackles bitemporal change detection in high-resolution remote sensing by introducing Dsfer-Net, a deep supervision and feature retrieval network that leverages modern Hopfield networks to retrieve and aggregate differential features across time. The methodology combines a Siamese VGG-16 backbone, Deeply Supervised Feature Retrieval (DSFR) modules, and a Comprehensive Fusion (CF) decoder to fuse multi-scale features with retrieved change signals, trained via a hybrid loss that includes weighted BCE and Dice terms. Empirical evaluation on LEVIR-CD, WHU-CD, and CDD demonstrates superior accuracy (F1, IoU, OA, and Precision) over eight state-of-the-art methods, with ablation studies confirming the beneficial contributions of DSFR and CF. The approach also provides explainable evidence of semantic understanding through visualizations of Hopfield-retrieved features, highlighting real temporal changes while indicating some limitations under strong pseudo-changes. Overall, Dsfer-Net offers a scalable, end-to-end framework that improves contour preservation and reduces false alarms in challenging remote sensing CD tasks, with potential for further refinements in handling pseudo-changes and training dynamics.
Abstract
Change detection, an essential application for high-resolution remote sensing images, aims to monitor and analyze changes in the land surface over time. Due to the rapid increase in the quantity of high-resolution remote sensing data and the complexity of texture features, several quantitative deep learning-based methods have been proposed. These methods outperform traditional change detection methods by extracting deep features and combining spatial-temporal information. However, reasonable explanations for how deep features improve detection performance are still lacking. In our investigations, we found that modern Hopfield network layers significantly enhance semantic understanding. In this paper, we propose a Deep Supervision and FEature Retrieval network (Dsfer-Net) for bitemporal change detection. Specifically, the highly representative deep features of bitemporal images are jointly extracted through a fully convolutional Siamese network. Based on the sequential geographical information of the bitemporal images, we designed a feature retrieval module to extract difference features and leverage discriminative information in a deeply supervised manner. Additionally, we observed that the deeply supervised feature retrieval module provides explainable evidence of the semantic understanding of the proposed network in its deep layers. Finally, our end-to-end network establishes a novel framework by aggregating retrieved features and feature pairs from different layers. Experiments conducted on three public datasets (LEVIR-CD, WHU-CD, and CDD) confirm the superiority of the proposed Dsfer-Net over other state-of-the-art methods.
