A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection

Chenyao Zhou; Haotian Zhang; Han Guo; Zhengxia Zou; Zhenwei Shi

A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection

Chenyao Zhou, Haotian Zhang, Han Guo, Zhengxia Zou, Zhenwei Shi

TL;DR

The paper addresses semantic change detection by moving away from entangled triple-branch networks toward a late-stage, disentangled fusion framework (LSAFNet) that uses two semantic segmentation branches and one binary change branch. It introduces semantic fusion (SFM) with local-global attentional aggregation (LGAA) and local-global context enhancement (LGCE) to refine cross-temporal features, and a Change Detection Decoder to bridge temporal branches. The approach achieves state-of-the-art results on the SECOND and Landsat-SCD datasets, with ablations showing substantial gains from LGAA and LGCE in both segmentation and change-detection tasks. This disentangled design also facilitates integration with pretrained foundation models, enabling robust, fine-grained semantic change mapping for geospatial applications.

Abstract

Semantic change detection is an important task in geoscience and earth observation. By producing a semantic change map for each temporal phase, both the land use land cover categories and change information can be interpreted. Recently some multi-task learning based semantic change detection methods have been proposed to decompose the task into semantic segmentation and binary change detection subtasks. However, previous works comprise triple branches in an entangled manner, which may not be optimal and hard to adopt foundation models. Besides, lacking explicit refinement of bitemporal features during fusion may cause low accuracy. In this letter, we propose a novel late-stage bitemporal feature fusion network to address the issue. Specifically, we propose local global attentional aggregation module to strengthen feature fusion, and propose local global context enhancement module to highlight pivotal semantics. Comprehensive experiments are conducted on two public datasets, including SECOND and Landsat-SCD. Quantitative and qualitative results show that our proposed model achieves new state-of-the-art performance on both datasets.

A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection

TL;DR

Abstract

Paper Structure (13 sections, 18 equations, 4 figures, 3 tables)

This paper contains 13 sections, 18 equations, 4 figures, 3 tables.

Introduction
Methods
Overall Architecture
Semantic Fusion Module
Semantic Segmentation Decoder
Change Detection Decoder
Loss Function
Experiments
Datasets and Evaluation Metrics
Implementation Details
Comparison and Analysis
Ablation Study
Conclusion

Figures (4)

Figure 1: Architecture comparison between previous works and our proposed model. (a) Previous works merge bitemporal SS branches from encoders. (b) Our proposed network fuse SS decoded features to achieve BCD.
Figure 2: Architectures of our proposed LSAFNet and its components. (a) Flowchart of LSAFNet. (b) Architecture of CD Decoder. (c) Architecture of SFM and detailed structure of LGAA. (d) Architecture of SS Decoder and detailed structure of LGCE, respectively.
Figure 3: Qualitative comparisons of the results on SECOND dataset. First two rows and last two rows contain different bitemporal image pairs, respectively.
Figure 4: Qualitative comparisons of the results on Landsat dataset. First two rows and last two rows contain different bitemporal image pairs, respectively.

A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection

TL;DR

Abstract

A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (4)