Table of Contents
Fetching ...

IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection

Qing Guo, Ruofei Wang, Rui Huang, Shuifa Sun, Yuxiang Zhang

TL;DR

This work targets change detection under challenging background variations by focusing on the quality of the feature difference between bi-temporal images. It introduces IDET, a modular trio of transformers where two long-range feature extractors support a third difference-enhancement transformer that iteratively refines the difference signal. A multi-scale extension uses a UNet-based extractor to obtain representations at multiple scales and fuses refinements in a coarse-to-fine pipeline to produce the final change map. Across six datasets, IDET achieves state-of-the-art results and can be embedded into existing CD methods to boost their performance, underscoring the practical value of improving feature-difference quality for robust change detection.

Abstract

Change detection (CD) aims to detect change regions within an image pair captured at different times, playing a significant role in diverse real-world applications. Nevertheless, most of the existing works focus on designing advanced network architectures to map the feature difference to the final change map while ignoring the influence of the quality of the feature difference. In this paper, we study the CD from a different perspective, i.e., how to optimize the feature difference to highlight changes and suppress unchanged regions, and propose a novel module denoted as iterative difference-enhanced transformers (IDET). IDET contains three transformers: two transformers for extracting the long-range information of the two images and one transformer for enhancing the feature difference. In contrast to the previous transformers, the third transformer takes the outputs of the first two transformers to guide the enhancement of the feature difference iteratively. To achieve more effective refinement, we further propose the multi-scale IDET-based change detection that uses multi-scale representations of the images for multiple feature difference refinements and proposes a coarse-to-fine fusion strategy to combine all refinements. Our final CD method outperforms seven state-of-the-art methods on six large-scale datasets under diverse application scenarios, which demonstrates the importance of feature difference enhancements and the effectiveness of IDET.

IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection

TL;DR

This work targets change detection under challenging background variations by focusing on the quality of the feature difference between bi-temporal images. It introduces IDET, a modular trio of transformers where two long-range feature extractors support a third difference-enhancement transformer that iteratively refines the difference signal. A multi-scale extension uses a UNet-based extractor to obtain representations at multiple scales and fuses refinements in a coarse-to-fine pipeline to produce the final change map. Across six datasets, IDET achieves state-of-the-art results and can be embedded into existing CD methods to boost their performance, underscoring the practical value of improving feature-difference quality for robust change detection.

Abstract

Change detection (CD) aims to detect change regions within an image pair captured at different times, playing a significant role in diverse real-world applications. Nevertheless, most of the existing works focus on designing advanced network architectures to map the feature difference to the final change map while ignoring the influence of the quality of the feature difference. In this paper, we study the CD from a different perspective, i.e., how to optimize the feature difference to highlight changes and suppress unchanged regions, and propose a novel module denoted as iterative difference-enhanced transformers (IDET). IDET contains three transformers: two transformers for extracting the long-range information of the two images and one transformer for enhancing the feature difference. In contrast to the previous transformers, the third transformer takes the outputs of the first two transformers to guide the enhancement of the feature difference iteratively. To achieve more effective refinement, we further propose the multi-scale IDET-based change detection that uses multi-scale representations of the images for multiple feature difference refinements and proposes a coarse-to-fine fusion strategy to combine all refinements. Our final CD method outperforms seven state-of-the-art methods on six large-scale datasets under diverse application scenarios, which demonstrates the importance of feature difference enhancements and the effectiveness of IDET.
Paper Structure (25 sections, 10 equations, 10 figures, 8 tables)

This paper contains 25 sections, 10 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Visualization comparison of four existing change detectors (i.e., ADCDNet huang2020change, STANet chen2020spatial, BIT chen2021a and ChangeFormer bandara2022transformer) and their enhanced counterparts with our IDET. The first two columns are the inputs. The third and fourth columns are the feature differences (i.e., $\textbf{D}$) and detection results of the four existing methods. The fifth and sixth columns are enhanced feature differences (i.e., $\hat{\textbf{D}}$) and the corresponding detection results. The final column displays the ground truth.
  • Figure 2: (a) Evaluation results on the VL-CMU-CD dataset by adding noise with different severities to the feature difference $\mathbf{D}$. (b) and (c) display two cases including the original feature difference, noisy feature difference $\mathbf{D}$, enhanced feature difference $\hat{\mathbf{D}}$, the detection results $\mathbf{M}$ under $\mathbf{D}$ and $\hat{\mathbf{D}}$, and the ground truth, respectively. The main changes are highlighted via red arrows. IDET effectively suppresses the background regions and highlights changes.
  • Figure 3: Existing architecture for feature difference-based change detection.
  • Figure 4: The architecture of the Iterative Difference-enhanced Transformers (IDET) with iteration $T=1$. With inputting the reference and query image representations (i.e., $\mathbf{R}_{\text{x}}$ and $\mathbf{R}_{\text{y}}$), the initial feature difference $\mathbf{D}$ would be refined by the $\Tilde{\mathbf{D}}$ to generate the enhanced counterpart $\hat{\mathbf{D}}$, where $\mathbf{D}$ and $\Tilde{\mathbf{D}}$ are obtained by Eq. \ref{['eq:featdiff']} and Eq. \ref{['eq:idet_diff']}, respectively. In the next iteration (e.g., $T=2$), $\mathbf{R}_{\text{x}}$, $\mathbf{R}_{\text{y}}$, and $\mathbf{D}$ would be replaced by $\Tilde{\mathbf{R}}_{\text{x}}$, $\Tilde{\mathbf{R}}_{\text{y}}$, and $\hat{\mathbf{D}}$ respectively, enhancing $\hat{\mathbf{D}}$ again.
  • Figure 5: The framework of our multi-scale IDET-based change detection. First, we input bi-temporal images (i.e., reference image and query image) into a feature extractor to extract multi-scale features ($\mathbf{R}_{\text{x}}^l$, $\mathbf{R}_{\text{y}}^l$). Second, IDET is used to enhance feature difference $\mathbf{D}$ at different scales. Finally, a coarse-to-fine fusion strategy is introduced to fuse all enhanced differences to generate the final change map $\mathbf{M}$.
  • ...and 5 more figures