Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing

Buddhi Wijenayake; Athulya Ratnayake; Praveen Sumanasekara; Roshan Godaliyadda; Parakrama Ekanayake; Vijitha Herath; Nichula Wasalathilaka

Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing

Buddhi Wijenayake, Athulya Ratnayake, Praveen Sumanasekara, Roshan Godaliyadda, Parakrama Ekanayake, Vijitha Herath, Nichula Wasalathilaka

TL;DR

Mamba-FCS tackles semantic change detection in remote sensing by uniting a Visual State Space Model backbone with a joint spatio-frequency fusion module, a change-guided attention mechanism, and a SeK-inspired loss to jointly optimize binary and semantic change tasks. The approach introduces FFT-based frequency cues, a frequency-aware fusion block, and a CGA that propagates change information into semantic decoders, enabling mutual reinforcement between BCD and SCD. Empirical results on the SECOND and Landsat-SCD datasets demonstrate state-of-the-art performance across OA, $F_{scd}$, mIoU, and SeK, with notable improvements on rare transitions and boundary delineation. The work also shows that the linear-complexity VMamba backbone sustains high performance with scalable computational costs, making it well-suited for large-scale, high-resolution SCD deployments in remote sensing.

Abstract

Semantic Change Detection (SCD) from remote sensing imagery requires models balancing extensive spatial context, computational efficiency, and sensitivity to class-imbalanced land-cover transitions. While Convolutional Neural Networks excel at local feature extraction but lack global context, Transformers provide global modeling at high computational costs. Recent Mamba architectures based on state-space models offer compelling solutions through linear complexity and efficient long-range modeling. In this study, we introduce Mamba-FCS, a SCD framework built upon Visual State Space Model backbone incorporating, a Joint Spatio-Frequency Fusion block incorporating log-amplitude frequency domain features to enhance edge clarity and suppress illumination artifacts, a Change-Guided Attention (CGA) module that explicitly links the naturally intertwined BCD and SCD tasks, and a Separated Kappa (SeK) loss tailored for class-imbalanced performance optimization. Extensive evaluation on SECOND and Landsat-SCD datasets shows that Mamba-FCS achieves state-of-the-art metrics, 88.62% Overall Accuracy, 65.78% F_scd, and 25.50% SeK on SECOND, 96.25% Overall Accuracy, 89.27% F_scd, and 60.26% SeK on Landsat-SCD. Ablation analyses confirm distinct contributions of each novel component, with qualitative assessments highlighting significant improvements in SCD. Our results underline the substantial potential of Mamba architectures, enhanced by proposed techniques, setting a new benchmark for effective and scalable semantic change detection in remote sensing applications. The complete source code, configuration files, and pre-trained models will be publicly available upon publication.

Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing

TL;DR

Abstract

Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)