MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification
Weikang Yu, Xiaokang Zhang, Samiran Das, Xiao Xiang Zhu, Pedram Ghamisi
TL;DR
MaskCD addresses the challenge of precise object-level change detection in very high-resolution remote sensing imagery by reframing the task as mask classification rather than pixel-wise labeling. The method combines a hierarchical Transformer-based Siamese encoder, a Cross-Level Change Representation Perceiver with DeformMHSA, and a Masked Attention-based DETR decoder to generate and classify multiple change masks end-to-end. Experimental results on five diverse RS-CD datasets show consistent improvements over state-of-the-art methods in F1 and mIoU, with higher precision and better object integrity. The approach offers a practical, scalable pipeline for accurate change mapping in complex scenes and provides publicly available code.
Abstract
Change detection (CD) from remote sensing (RS) images using deep learning has been widely investigated in the literature. It is typically regarded as a pixel-wise labeling task that aims to classify each pixel as changed or unchanged. Although per-pixel classification networks in encoder-decoder structures have shown dominance, they still suffer from imprecise boundaries and incomplete object delineation at various scenes. For high-resolution RS images, partly or totally changed objects are more worthy of attention rather than a single pixel. Therefore, we revisit the CD task from the mask prediction and classification perspective and propose MaskCD to detect changed areas by adaptively generating categorized masks from input image pairs. Specifically, it utilizes a cross-level change representation perceiver (CLCRP) to learn multiscale change-aware representations and capture spatiotemporal relations from encoded features by exploiting deformable multihead self-attention (DeformMHSA). Subsequently, a masked-attention-based detection transformers (MA-DETR) decoder is developed to accurately locate and identify changed objects based on masked attention and self-attention mechanisms. It reconstructs the desired changed objects by decoding the pixel-wise representations into learnable mask proposals and making final predictions from these candidates. Experimental results on five benchmark datasets demonstrate the proposed approach outperforms other state-of-the-art models. Codes and pretrained models are available online (https://github.com/EricYu97/MaskCD).
