Table of Contents
Fetching ...

A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection

Chenyao Zhou, Haotian Zhang, Han Guo, Zhengxia Zou, Zhenwei Shi

TL;DR

The paper addresses semantic change detection by moving away from entangled triple-branch networks toward a late-stage, disentangled fusion framework (LSAFNet) that uses two semantic segmentation branches and one binary change branch. It introduces semantic fusion (SFM) with local-global attentional aggregation (LGAA) and local-global context enhancement (LGCE) to refine cross-temporal features, and a Change Detection Decoder to bridge temporal branches. The approach achieves state-of-the-art results on the SECOND and Landsat-SCD datasets, with ablations showing substantial gains from LGAA and LGCE in both segmentation and change-detection tasks. This disentangled design also facilitates integration with pretrained foundation models, enabling robust, fine-grained semantic change mapping for geospatial applications.

Abstract

Semantic change detection is an important task in geoscience and earth observation. By producing a semantic change map for each temporal phase, both the land use land cover categories and change information can be interpreted. Recently some multi-task learning based semantic change detection methods have been proposed to decompose the task into semantic segmentation and binary change detection subtasks. However, previous works comprise triple branches in an entangled manner, which may not be optimal and hard to adopt foundation models. Besides, lacking explicit refinement of bitemporal features during fusion may cause low accuracy. In this letter, we propose a novel late-stage bitemporal feature fusion network to address the issue. Specifically, we propose local global attentional aggregation module to strengthen feature fusion, and propose local global context enhancement module to highlight pivotal semantics. Comprehensive experiments are conducted on two public datasets, including SECOND and Landsat-SCD. Quantitative and qualitative results show that our proposed model achieves new state-of-the-art performance on both datasets.

A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection

TL;DR

The paper addresses semantic change detection by moving away from entangled triple-branch networks toward a late-stage, disentangled fusion framework (LSAFNet) that uses two semantic segmentation branches and one binary change branch. It introduces semantic fusion (SFM) with local-global attentional aggregation (LGAA) and local-global context enhancement (LGCE) to refine cross-temporal features, and a Change Detection Decoder to bridge temporal branches. The approach achieves state-of-the-art results on the SECOND and Landsat-SCD datasets, with ablations showing substantial gains from LGAA and LGCE in both segmentation and change-detection tasks. This disentangled design also facilitates integration with pretrained foundation models, enabling robust, fine-grained semantic change mapping for geospatial applications.

Abstract

Semantic change detection is an important task in geoscience and earth observation. By producing a semantic change map for each temporal phase, both the land use land cover categories and change information can be interpreted. Recently some multi-task learning based semantic change detection methods have been proposed to decompose the task into semantic segmentation and binary change detection subtasks. However, previous works comprise triple branches in an entangled manner, which may not be optimal and hard to adopt foundation models. Besides, lacking explicit refinement of bitemporal features during fusion may cause low accuracy. In this letter, we propose a novel late-stage bitemporal feature fusion network to address the issue. Specifically, we propose local global attentional aggregation module to strengthen feature fusion, and propose local global context enhancement module to highlight pivotal semantics. Comprehensive experiments are conducted on two public datasets, including SECOND and Landsat-SCD. Quantitative and qualitative results show that our proposed model achieves new state-of-the-art performance on both datasets.
Paper Structure (13 sections, 18 equations, 4 figures, 3 tables)

This paper contains 13 sections, 18 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Architecture comparison between previous works and our proposed model. (a) Previous works merge bitemporal SS branches from encoders. (b) Our proposed network fuse SS decoded features to achieve BCD.
  • Figure 2: Architectures of our proposed LSAFNet and its components. (a) Flowchart of LSAFNet. (b) Architecture of CD Decoder. (c) Architecture of SFM and detailed structure of LGAA. (d) Architecture of SS Decoder and detailed structure of LGCE, respectively.
  • Figure 3: Qualitative comparisons of the results on SECOND dataset. First two rows and last two rows contain different bitemporal image pairs, respectively.
  • Figure 4: Qualitative comparisons of the results on Landsat dataset. First two rows and last two rows contain different bitemporal image pairs, respectively.