Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Lei Ding; Jing Zhang; Kai Zhang; Haitao Guo; Bing Liu; Lorenzo Bruzzone

Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Lei Ding, Jing Zhang, Kai Zhang, Haitao Guo, Bing Liu, Lorenzo Bruzzone

TL;DR

The paper tackles semantic change detection in bi-temporal remote sensing imagery by addressing the challenge of learning semantic changes with limited samples and ensuring consistency across time. It introduces SCanNet, a hybrid CNN–Transformer framework that first extracts temporal semantic and change features with a Triple Encoder-Decoder (TED), then models deep spatio-temporal semantic–change dependencies using a Cross-Shaped Window Transformer head (SCanFormer). A semantic-learning scheme with temporal-consistency constraints employs semantic supervision on changes, pseudo-labels for unchanged areas, and a bi-temporal consistency loss to align predictions, achieving state-of-the-art results on SECOND and Landsat-SCD. Ablation studies confirm the benefits of the TED architecture, the semantic-learning losses, and the SCanFormer module. Overall, the approach advances SCD by explicitly modeling semantic–change correlations over space and time, improving both detection accuracy and semantic consistency of bi-temporal results.

Abstract

Semantic Change Detection (SCD) refers to the task of simultaneously extracting the changed areas and the semantic categories (before and after the changes) in Remote Sensing Images (RSIs). This is more meaningful than Binary Change Detection (BCD) since it enables detailed change analysis in the observed areas. Previous works established triple-branch Convolutional Neural Network (CNN) architectures as the paradigm for SCD. However, it remains challenging to exploit semantic information with a limited amount of change samples. In this work, we investigate to jointly consider the spatio-temporal dependencies to improve the accuracy of SCD. First, we propose a Semantic Change Transformer (SCanFormer) to explicitly model the 'from-to' semantic transitions between the bi-temporal RSIs. Then, we introduce a semantic learning scheme to leverage the spatio-temporal constraints, which are coherent to the SCD task, to guide the learning of semantic changes. The resulting network (SCanNet) significantly outperforms the baseline method in terms of both detection of critical semantic changes and semantic consistency in the obtained bi-temporal results. It achieves the SOTA accuracy on two benchmark datasets for the SCD.

Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

TL;DR

Abstract

Paper Structure (17 sections, 14 equations, 9 figures, 3 tables)

This paper contains 17 sections, 14 equations, 9 figures, 3 tables.

Introduction
Related Work
Binary Change Detection
Semantic Change Detection
Vision Transformer
Proposed SCanNet for SCD
CNN Architecture for SCD
SCanFormer: 'Semantic-Change' dependency modeling with Transformer
Semantic Learning with Temporal Consistency Constraints
Implementation Details
Dataset Description and Experimental Settings
Dataset
Evaluation Metrics
Experimental Results
Ablation Study
...and 2 more sections

Figures (9)

Figure 1: Illustration of the SCD task.
Figure 2: Comparison of SCD frameworks: (a) SSCDl ding2022bi and (b) the proposed Triple Encoder-Decoder (TED) network. The dash arrows represent skip connections.
Figure 3: Architecture of the proposed ScanNet (Semantic Change Network) for SCD.
Figure 4: Using temporal consistency as prior constraint to exploit semantic information in (a) no-change areas and (b) change areas, respectively.
Figure 5: Generation of the pseudo labels.
...and 4 more figures

Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

TL;DR

Abstract

Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Authors

TL;DR

Abstract

Table of Contents

Figures (9)