Table of Contents
Fetching ...

TaCo: Capturing Spatio-Temporal Semantic Consistency in Remote Sensing Change Detection

Han Guo, Chenyang Liu, Haotian Zhang, Bowen Chen, Zhengxia Zou, Zhenwei Shi

TL;DR

TaCo addresses the limitation of mask-supervised remote sensing change detection by treating change as a semantic transition between bi-temporal states. It introduces a Text-guided Transition Generator and a spatio-temporal semantic joint constraint to enforce temporal semantic consistency while preserving spatial localization, with no inference-time overhead. The method leverages textual priors from dataset categories via a SoftMoE-based Adaptive Semantic Integration and cross-modal fusion to produce cross-temporal transition features. Experiments on six public datasets demonstrate consistent state-of-the-art performance for both semantic and binary change detection, along with detailed analyses of the proposed constraints.

Abstract

Remote sensing change detection (RSCD) aims to identify surface changes across bi-temporal satellite images. Most previous methods rely solely on mask supervision, which effectively guides spatial localization but provides limited constraints on the temporal semantic transitions. Consequently, they often produce spatially coherent predictions while still suffering from unresolved semantic inconsistencies. To address this limitation, we propose TaCo, a spatio-temporal semantic consistent network, which enriches the existing mask-supervised framework with a spatio-temporal semantic joint constraint. TaCo conceptualizes change as a semantic transition between bi-temporal states, in which one temporal feature representation can be derived from the other via dedicated transition features. To realize this, we introduce a Text-guided Transition Generator that integrates textual semantics with bi-temporal visual features to construct the cross-temporal transition features. In addition, we propose a spatio-temporal semantic joint constraint consisting of bi-temporal reconstruct constraints and a transition constraint: the former enforces alignment between reconstructed and original features, while the latter enhances discrimination for changes. This design can yield substantial performance gains without introducing any additional computational overhead during inference. Extensive experiments on six public datasets, spanning both binary and semantic change detection tasks, demonstrate that TaCo consistently achieves SOTA performance.

TaCo: Capturing Spatio-Temporal Semantic Consistency in Remote Sensing Change Detection

TL;DR

TaCo addresses the limitation of mask-supervised remote sensing change detection by treating change as a semantic transition between bi-temporal states. It introduces a Text-guided Transition Generator and a spatio-temporal semantic joint constraint to enforce temporal semantic consistency while preserving spatial localization, with no inference-time overhead. The method leverages textual priors from dataset categories via a SoftMoE-based Adaptive Semantic Integration and cross-modal fusion to produce cross-temporal transition features. Experiments on six public datasets demonstrate consistent state-of-the-art performance for both semantic and binary change detection, along with detailed analyses of the proposed constraints.

Abstract

Remote sensing change detection (RSCD) aims to identify surface changes across bi-temporal satellite images. Most previous methods rely solely on mask supervision, which effectively guides spatial localization but provides limited constraints on the temporal semantic transitions. Consequently, they often produce spatially coherent predictions while still suffering from unresolved semantic inconsistencies. To address this limitation, we propose TaCo, a spatio-temporal semantic consistent network, which enriches the existing mask-supervised framework with a spatio-temporal semantic joint constraint. TaCo conceptualizes change as a semantic transition between bi-temporal states, in which one temporal feature representation can be derived from the other via dedicated transition features. To realize this, we introduce a Text-guided Transition Generator that integrates textual semantics with bi-temporal visual features to construct the cross-temporal transition features. In addition, we propose a spatio-temporal semantic joint constraint consisting of bi-temporal reconstruct constraints and a transition constraint: the former enforces alignment between reconstructed and original features, while the latter enhances discrimination for changes. This design can yield substantial performance gains without introducing any additional computational overhead during inference. Extensive experiments on six public datasets, spanning both binary and semantic change detection tasks, demonstrate that TaCo consistently achieves SOTA performance.

Paper Structure

This paper contains 13 sections, 10 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: (a) Existing methods suffer from semantic misclassification and pseudo-changes. (b) Our method introduces a spatio-temporal semantic joint constraint during training.
  • Figure 2: Overview of the proposed TaCo. (a) Structure and inference pipeline based on the siamese encoder–decoder. (b) Spatio-temporal semantic joint constraint on high-level features via reconstruction and transition losses. (c) Text-guided Transition Generator that fuses class-level text embeddings with stage-4 visual tokens to construct transition features $\Delta_i$.
  • Figure 3: Visualization results on the SECOND dataset.
  • Figure 4: Visualization results on the WHU-CD dataset.
  • Figure 5: Visualization results of the differential feature maps on the SECOND dataset.
  • ...and 7 more figures