Table of Contents
Fetching ...

CFNet: Optimizing Remote Sensing Change Detection through Content-Aware Enhancement

Fan Wu, Sijun Dong, Xiaoliang Meng

TL;DR

CFNet tackles unpredictable style differences in bi-temporal remote sensing change detection by introducing a Content-Aware constraint and a plug-and-play Focuser module that separately learns Changed Content (CC) and Unchanged Content (UCC). The architecture stacks a partial EfficientNet-B5 encoder with Content Focuser and Change decoders, governed by a multi-term loss $L = \alpha L_{main} + \beta L_{cc} + \gamma L_{ucc}$ where $\alpha=1$, $\beta=\gamma=0.1$, and per-scale RM$_i$ reweights guide feature fusion. Content-Aware losses leverage internal structural similarities via random sampling to promote stable unchanged content representations while emphasizing content changes. Empirically, CFNet achieves state-of-the-art results on CLCD, LEVIR-CD, and SYSU-CD, with ablations confirming the complementary benefits of the Content-Aware strategy and Focuser module, and analyses highlighting effective alignment of RM$_4$ with ground truth.

Abstract

Change detection is a crucial and widely applied task in remote sensing, aimed at identifying and analyzing changes occurring in the same geographical area over time. Due to variability in acquisition conditions, bi-temporal remote sensing images often exhibit significant differences in image style. Even with the powerful generalization capabilities of DNNs, these unpredictable style variations between bi-temporal images inevitably affect model's ability to accurately detect changed areas. To address issue above, we propose the Content Focuser Network (CFNet), which takes content-aware strategy as a key insight. CFNet employs EfficientNet-B5 as the backbone for feature extraction. To enhance the model's focus on the content features of images while mitigating the misleading effects of style features, we develop a constraint strategy that prioritizes the content features of bi-temporal images, termed Content-Aware. Furthermore, to enable the model to flexibly focus on changed and unchanged areas according to the requirements of different stages, we design a reweighting module based on the cosine distance between bi-temporal image features, termed Focuser. CFNet achieve outstanding performance across three well-known change detection datasets: CLCD (F1: 81.41%, IoU: 68.65%), LEVIR-CD (F1: 92.18%, IoU: 85.49%), and SYSU-CD (F1: 82.89%, IoU: 70.78%). The code and pretrained models of CFNet are publicly released at https://github.com/wifiBlack/CFNet.

CFNet: Optimizing Remote Sensing Change Detection through Content-Aware Enhancement

TL;DR

CFNet tackles unpredictable style differences in bi-temporal remote sensing change detection by introducing a Content-Aware constraint and a plug-and-play Focuser module that separately learns Changed Content (CC) and Unchanged Content (UCC). The architecture stacks a partial EfficientNet-B5 encoder with Content Focuser and Change decoders, governed by a multi-term loss where , , and per-scale RM reweights guide feature fusion. Content-Aware losses leverage internal structural similarities via random sampling to promote stable unchanged content representations while emphasizing content changes. Empirically, CFNet achieves state-of-the-art results on CLCD, LEVIR-CD, and SYSU-CD, with ablations confirming the complementary benefits of the Content-Aware strategy and Focuser module, and analyses highlighting effective alignment of RM with ground truth.

Abstract

Change detection is a crucial and widely applied task in remote sensing, aimed at identifying and analyzing changes occurring in the same geographical area over time. Due to variability in acquisition conditions, bi-temporal remote sensing images often exhibit significant differences in image style. Even with the powerful generalization capabilities of DNNs, these unpredictable style variations between bi-temporal images inevitably affect model's ability to accurately detect changed areas. To address issue above, we propose the Content Focuser Network (CFNet), which takes content-aware strategy as a key insight. CFNet employs EfficientNet-B5 as the backbone for feature extraction. To enhance the model's focus on the content features of images while mitigating the misleading effects of style features, we develop a constraint strategy that prioritizes the content features of bi-temporal images, termed Content-Aware. Furthermore, to enable the model to flexibly focus on changed and unchanged areas according to the requirements of different stages, we design a reweighting module based on the cosine distance between bi-temporal image features, termed Focuser. CFNet achieve outstanding performance across three well-known change detection datasets: CLCD (F1: 81.41%, IoU: 68.65%), LEVIR-CD (F1: 92.18%, IoU: 85.49%), and SYSU-CD (F1: 82.89%, IoU: 70.78%). The code and pretrained models of CFNet are publicly released at https://github.com/wifiBlack/CFNet.

Paper Structure

This paper contains 20 sections, 16 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: humans can easily identify the changed content areas between two images taken under different conditions, without being significantly influenced by image style factors such as brightness, contrast, etc. However,this task is sometimes difficult for computer.
  • Figure 2: Image T1 and Image T2 exhibit significant style differences. Each ellipse in figure represents the distribution of a pixel's features in feature space. And We use $\theta$ to denote the difference between the features of two sampled points in feature space. Next, we randomly sample two points each from both the changed and unchanged areas. The red boxes in the figure represent sampling points from the unchanged areas, while the green boxes represent those from the changed areas. In the unchanged areas, where the internal structure is similar,the value of $\theta_1$ is close to the value of $\theta_2$. In contrast, in the changed areas, where the internal structure varies significantly, the value of $\theta_3$ deviates from the value of $\theta_4$.
  • Figure 3: The overall architecture of CFNet. The architecture is divided into four key stages: Feature Extraction, Content Focuser Decoder, Change Decoder, and Loss Computation. In Stage I, a partial EfficientNet-B5 backbone extracts multi-scale features from bi-temporal images. In Stage II, The decoder extracts content features, and the Focuser module generates reweighting maps to separate changed and unchanged content. In Stage III, Content features and reweighting maps are leverged to generate the Change Map. In Stage IV, the total loss consists of the Main Loss $L_{main}$ , computed using MSE loss between the Change Map and the ground truth, and the auxiliary losses $L_{ucc}$ and $L_{cc}$ , which work collaboratively to distinguish changed and unchanged areas, further enhancing the model’s performance. $L_{cc}$ denotes "Changed Content Loss" and $L_{ucc}$ denotes "Unchanged Content Loss".
  • Figure 4: The detailed architecture of Content Decoder. $F_{i}, i=1,2,3,4$ denotes the output of the Encoder, specifically $F_{ai}$ or $F_{bi}$. $C_{i}, i=1,2,3,4$ represents content feature, specifically $C_{ai}$ or $C_{bi}$. The Agg module is used to aggregate the feature maps from adjacent scales of the encoder's output.
  • Figure 5: The detailed architecture of Change Decoder. The red arrows labeled as "CBAM" represents Convolutional Block Attention Modulewoo2018cbam.It is a lightweight module that enhances feature representation by applying channel and spatial attention, helping the network focus on the most relevant features and areas. The blue arrows labeled "Unsqueeze+Concate" indicates that two inputs are each expanded by an identical new dimension, concatenated along this new dimension to produce an output, which is then used for subsequent 3D convolution operations. The yellow arrows labeled as "Multiple RM" represents performing a dot product between the feature map at the starting point of the arrow and the corresponding scale $RM_{i}$ from Focuser module.
  • ...and 7 more figures