Table of Contents
Fetching ...

ChangeBind: A Hybrid Change Encoder for Remote Sensing Change Detection

Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal

TL;DR

This paper presents ChangeBind, a Siamese-based framework for remote sensing change detection that jointly leverages convolutional and self-attention mechanisms through a dedicated Change Encoder. The encoder computes both convolutional change encodings (CCE) and attentional change encodings (ACE) across multiple scales, which are fused to form rich change representations for accurate localization. Experiments on LEVIR-CD and CDD-CD show state-of-the-art IoU metrics, demonstrating improved detection of both subtle and large changes by exploiting multi-scale, local, and global cues. The proposed approach offers robust, practical benefits for land-use monitoring and disaster response by providing precise, scalable change maps from bi-temporal RS imagery.

Abstract

Change detection (CD) is a fundamental task in remote sensing (RS) which aims to detect the semantic changes between the same geographical regions at different time stamps. Existing convolutional neural networks (CNNs) based approaches often struggle to capture long-range dependencies. Whereas recent transformer-based methods are prone to the dominant global representation and may limit their capabilities to capture the subtle change regions due to the complexity of the objects in the scene. To address these limitations, we propose an effective Siamese-based framework to encode the semantic changes occurring in the bi-temporal RS images. The main focus of our design is to introduce a change encoder that leverages local and global feature representations to capture both subtle and large change feature information from multi-scale features to precisely estimate the change regions. Our experimental study on two challenging CD datasets reveals the merits of our approach and obtains state-of-the-art performance.

ChangeBind: A Hybrid Change Encoder for Remote Sensing Change Detection

TL;DR

This paper presents ChangeBind, a Siamese-based framework for remote sensing change detection that jointly leverages convolutional and self-attention mechanisms through a dedicated Change Encoder. The encoder computes both convolutional change encodings (CCE) and attentional change encodings (ACE) across multiple scales, which are fused to form rich change representations for accurate localization. Experiments on LEVIR-CD and CDD-CD show state-of-the-art IoU metrics, demonstrating improved detection of both subtle and large changes by exploiting multi-scale, local, and global cues. The proposed approach offers robust, practical benefits for land-use monitoring and disaster response by providing precise, scalable change maps from bi-temporal RS imagery.

Abstract

Change detection (CD) is a fundamental task in remote sensing (RS) which aims to detect the semantic changes between the same geographical regions at different time stamps. Existing convolutional neural networks (CNNs) based approaches often struggle to capture long-range dependencies. Whereas recent transformer-based methods are prone to the dominant global representation and may limit their capabilities to capture the subtle change regions due to the complexity of the objects in the scene. To address these limitations, we propose an effective Siamese-based framework to encode the semantic changes occurring in the bi-temporal RS images. The main focus of our design is to introduce a change encoder that leverages local and global feature representations to capture both subtle and large change feature information from multi-scale features to precisely estimate the change regions. Our experimental study on two challenging CD datasets reveals the merits of our approach and obtains state-of-the-art performance.
Paper Structure (12 sections, 2 figures, 2 tables)

This paper contains 12 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The (a) illustrates the overall architecture of our proposed CD framework, referred as ChangeBind. The model takes a pair of bi-temporal images and extracts multi-scale features through a Siamese-based ResNet backbone. The multi-scale features ($X_{pre}^i$ and $X_{post}^i$, where $i \in {1,2,3,4}$) are fed to the change encoder that highlights the semantic change regions. Afterward, a decoder is utilized to upsample the encoded change features and predict change map $M$. The (b) represents the structure of the change encoder that takes features $X_{pre}^i$ and $X_{post}^i$, and utilizes the difference module to encode change regions. The (c) shows the design of the difference module which takes concatenated features of a single scale level, and utilizes convolution to obtain convolutional change encodings (CCE) and MHSA for obtaining attentional change encodings (ACE). These CCE and ACE feature representations are merged and projected using a convolution operation within the difference module. The outputs of the difference modules at higher scale levels are upsampled and combined to obtain the encoded change ($\bar{X}$) representations. Finally, these representations are input to the decoder to obtain the change prediction mask.
  • Figure 2: Qualitative results on the LEVIR-CD (top row) and CDD-CD (bottom row) datasets. We present a comparison with the best five existing change detection methods in the literature, whose codebases are publicly available. The highlighted region shows that our method is better at detecting the change regions as compared to FC-Siam-diff daudt2018fully, STANet chen2020spatial, DTCDSCN liu2020building, BIT chen2021remote_bit, and ChangeFormer bandara2022transformer methods.