A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection
Chenyao Zhou, Haotian Zhang, Han Guo, Zhengxia Zou, Zhenwei Shi
TL;DR
The paper addresses semantic change detection by moving away from entangled triple-branch networks toward a late-stage, disentangled fusion framework (LSAFNet) that uses two semantic segmentation branches and one binary change branch. It introduces semantic fusion (SFM) with local-global attentional aggregation (LGAA) and local-global context enhancement (LGCE) to refine cross-temporal features, and a Change Detection Decoder to bridge temporal branches. The approach achieves state-of-the-art results on the SECOND and Landsat-SCD datasets, with ablations showing substantial gains from LGAA and LGCE in both segmentation and change-detection tasks. This disentangled design also facilitates integration with pretrained foundation models, enabling robust, fine-grained semantic change mapping for geospatial applications.
Abstract
Semantic change detection is an important task in geoscience and earth observation. By producing a semantic change map for each temporal phase, both the land use land cover categories and change information can be interpreted. Recently some multi-task learning based semantic change detection methods have been proposed to decompose the task into semantic segmentation and binary change detection subtasks. However, previous works comprise triple branches in an entangled manner, which may not be optimal and hard to adopt foundation models. Besides, lacking explicit refinement of bitemporal features during fusion may cause low accuracy. In this letter, we propose a novel late-stage bitemporal feature fusion network to address the issue. Specifically, we propose local global attentional aggregation module to strengthen feature fusion, and propose local global context enhancement module to highlight pivotal semantics. Comprehensive experiments are conducted on two public datasets, including SECOND and Landsat-SCD. Quantitative and qualitative results show that our proposed model achieves new state-of-the-art performance on both datasets.
