MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification

Weikang Yu; Xiaokang Zhang; Samiran Das; Xiao Xiang Zhu; Pedram Ghamisi

MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification

Weikang Yu, Xiaokang Zhang, Samiran Das, Xiao Xiang Zhu, Pedram Ghamisi

TL;DR

MaskCD addresses the challenge of precise object-level change detection in very high-resolution remote sensing imagery by reframing the task as mask classification rather than pixel-wise labeling. The method combines a hierarchical Transformer-based Siamese encoder, a Cross-Level Change Representation Perceiver with DeformMHSA, and a Masked Attention-based DETR decoder to generate and classify multiple change masks end-to-end. Experimental results on five diverse RS-CD datasets show consistent improvements over state-of-the-art methods in F1 and mIoU, with higher precision and better object integrity. The approach offers a practical, scalable pipeline for accurate change mapping in complex scenes and provides publicly available code.

Abstract

Change detection (CD) from remote sensing (RS) images using deep learning has been widely investigated in the literature. It is typically regarded as a pixel-wise labeling task that aims to classify each pixel as changed or unchanged. Although per-pixel classification networks in encoder-decoder structures have shown dominance, they still suffer from imprecise boundaries and incomplete object delineation at various scenes. For high-resolution RS images, partly or totally changed objects are more worthy of attention rather than a single pixel. Therefore, we revisit the CD task from the mask prediction and classification perspective and propose MaskCD to detect changed areas by adaptively generating categorized masks from input image pairs. Specifically, it utilizes a cross-level change representation perceiver (CLCRP) to learn multiscale change-aware representations and capture spatiotemporal relations from encoded features by exploiting deformable multihead self-attention (DeformMHSA). Subsequently, a masked-attention-based detection transformers (MA-DETR) decoder is developed to accurately locate and identify changed objects based on masked attention and self-attention mechanisms. It reconstructs the desired changed objects by decoding the pixel-wise representations into learnable mask proposals and making final predictions from these candidates. Experimental results on five benchmark datasets demonstrate the proposed approach outperforms other state-of-the-art models. Codes and pretrained models are available online (https://github.com/EricYu97/MaskCD).

MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification

TL;DR

Abstract

Paper Structure (36 sections, 12 equations, 11 figures, 2 tables)

This paper contains 36 sections, 12 equations, 11 figures, 2 tables.

Introduction
Related Work
CNN-Based RS-CD Methods
Transformer-based RS-CD Methods
Object-based CD Methods
Mask-Based Vision Tasks
Methodology
Hierarchical Transformer-based Siamese Encoder
Cross-Level Change Representation Perceiver
Masked Attention-Based Detection Transformer Decoder
Masked Attention Block
Self-Attention Block
Mask Classification module
Optimization
Bipartite Matching
...and 21 more sections

Figures (11)

Figure 1: Comparisons between (a) per-pixel classification-based and (b) mask classification-based CD methods.
Figure 2: Illustration of (a) self-attention and (b) masked attention mechanism.
Figure 3: Proposed MaskCD framework for bi-temporal CD.
Figure 4: Proposed Cross-Level Change Representation Perceiver.
Figure 5: Proposed transformer decoder and mask classification module for bi-temporal CD.
...and 6 more figures

MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification

TL;DR

Abstract

MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (11)