Multi-Receptive Field Ensemble with Cross-Entropy Masking for Class Imbalance in Remote Sensing Change Detection
Humza Naveed, Xina Zeng, Mitch Bryson, Nagita Mehrseresht
TL;DR
This paper tackles RSCD under multi-scale changes and severe class imbalance by combining a Siamese FastSAM encoder with a multi-receptive field ensemble for spatio-temporal feature learning. It introduces a decoder ensemble and a multi-scale decoder fusion with attention (MSDFA) to fuse information across scales, followed by a classification head, all trained with a novel cross-entropy masking (CEM) loss that drops easy background pixels during optimization. On four RSCD datasets, the proposed SAM-ECEM method achieves state-of-the-art performance, including a 2.97% improvement in F1 on the challenging S2Looking dataset, and notable gains on Levir-CD, WHU-CD, and CLCD, demonstrating strong generalization. The approach provides practical benefits for remote sensing change detection by effectively balancing locality and global context while addressing data imbalance, with code available at the provided repository.
Abstract
Remote sensing change detection (RSCD) is a complex task, where changes often appear at different scales and orientations. Convolutional neural networks (CNNs) are good at capturing local spatial patterns but cannot model global semantics due to limited receptive fields. Alternatively, transformers can model long-range dependencies but are data hungry, and RSCD datasets are not large enough to train these models effectively. To tackle this, this paper presents a new architecture for RSCD which adapts a segment anything (SAM) vision foundation model and processes features from the SAM encoder through a multi-receptive field ensemble to capture local and global change patterns. We propose an ensemble of spatial-temporal feature enhancement (STFE) to capture cross-temporal relations, a decoder to reconstruct change patterns, and a multi-scale decoder fusion with attention (MSDFA) to fuse multi-scale decoder information and highlight key change patterns. Each branch in an ensemble operates on a separate receptive field to capture finer-to-coarser level details. Additionally, we propose a novel cross-entropy masking (CEM) loss to handle class-imbalance in RSCD datasets. Our work outperforms state-of-the-art (SOTA) methods on four change detection datasets, Levir-CD, WHU-CD, CLCD, and S2Looking. We achieved 2.97\% F1-score improvement on a complex S2Looking dataset. The code is available at: https://github.com/humza909/SAM-ECEM
