MSSFC-Net:Enhancing Building Interpretation with Multi-Scale Spatial-Spectral Feature Collaboration
Dehua Huo, Weida Zhan, Jinxin Guo, Depeng Zhu, Yu Chen, YiChun Jiang, Yueyi Han, Deng Han, Jin Li
TL;DR
This work tackles the joint problem of building extraction and change detection in remote sensing by introducing MSSFC-Net, a transformer-based dual-task framework that jointly models spatial-spectral features and temporal differences. It proposes three key modules: the DMFE-SSFC for efficient multi-scale spatial-spectral feature learning without extra parameters, the MDFM for robust multi-scale fusion of dual-temporal features, and a segmentation head with task-specific queries to unify downstream outputs. Empirical results on WHU, LEVIR-CD, and BANDON demonstrate state-of-the-art or competitive performance for both tasks and reveal clear evidence of cross-task synergy enabled by shared representations and hierarchical feature interactions. The approach yields higher accuracy and completeness in building delineation and change localization, with ablation studies confirming the essential roles of SSFC, MDFM, and DMFE and demonstrating potential for lightweight variants in future work.
Abstract
Building interpretation from remote sensing imagery primarily involves two fundamental tasks: building extraction and change detection. However, most existing methods address these tasks independently, overlooking their inherent correlation and failing to exploit shared feature representations for mutual enhancement. Furthermore, the diverse spectral,spatial, and scale characteristics of buildings pose additional challenges in jointly modeling spatial-spectral multi-scale features and effectively balancing precision and recall. The limited synergy between spatial and spectral representations often results in reduced detection accuracy and incomplete change localization.To address these challenges, we propose a Multi-Scale Spatial-Spectral Feature Cooperative Dual-Task Network (MSSFC-Net) for joint building extraction and change detection in remote sensing images. The framework integrates both tasks within a unified architecture, leveraging their complementary nature to simultaneously extract building and change features. Specifically,a Dual-branch Multi-scale Feature Extraction module (DMFE) with Spatial-Spectral Feature Collaboration (SSFC) is designed to enhance multi-scale representation learning, effectively capturing shallow texture details and deep semantic information, thus improving building extraction performance. For temporal feature aggregation, we introduce a Multi-scale Differential Fusion Module (MDFM) that explicitly models the interaction between differential and dual-temporal features. This module refines the network's capability to detect large-area changes and subtle structural variations in buildings. Extensive experiments conducted on three benchmark datasets demonstrate that MSSFC-Net achieves superior performance in both building extraction and change detection tasks, effectively improving detection accuracy while maintaining completeness.
