Table of Contents
Fetching ...

SeFi-CD: A Semantic First Change Detection Paradigm That Can Detect Any Change You Want

Ling Zhao, Zhenyang Huang, Dongsheng Kuang, Chengli Peng, Jun Gan, Haifeng Li

TL;DR

This work reframes change detection (CD) by arguing that semantic information should be the first imaging factor, introducing SeFi-CD and the AUWCD framework to enable prompt-guided, zero-retraining detection of arbitrary CRoIs. It leverages a Semantic Align Module built on vision-language models and a CRoI Segment Module using foundational segmentation models to locate and compare semantically defined regions across temporal imagery, with a CD Module computing the final change maps. The approach demonstrates strong adaptability and competitive performance on public datasets, achieving an average $F1$ improvement of $5.01\%$ and up to $13.17\%$ over current supervised baselines on the SECOND dataset, underscoring the practical impact of semantic-first CD. Overall, the results validate the paradigm shift toward semantic-first CD and highlight the value of prompt-based, multimodal guidance for scalable, flexible change detection in remote sensing.

Abstract

The existing change detection(CD) methods can be summarized as the visual-first change detection (ViFi-CD) paradigm, which first extracts change features from visual differences and then assigns them specific semantic information. However, CD is essentially dependent on change regions of interest (CRoIs), meaning that the CD results are directly determined by the semantics changes of interest, making its primary image factor semantic of interest rather than visual. The ViFi-CD paradigm can only assign specific semantics of interest to specific change features extracted from visual differences, leading to the inevitable omission of potential CRoIs and the inability to adapt to different CRoI CD tasks. In other words, changes in other CRoIs cannot be detected by the ViFi-CD method without retraining the model or significantly modifying the method. This paper introduces a new CD paradigm, the semantic-first CD (SeFi-CD) paradigm. The core idea of SeFi-CD is to first perceive the dynamic semantics of interest and then visually search for change features related to the semantics. Based on the SeFi-CD paradigm, we designed Anything You Want Change Detection (AUWCD). Experiments on public datasets demonstrate that the AUWCD outperforms the current state-of-the-art CD methods, achieving an average F1 score 5.01\% higher than that of these advanced supervised baselines on the SECOND dataset, with a maximum increase of 13.17\%. The proposed SeFi-CD offers a novel CD perspective and approach.

SeFi-CD: A Semantic First Change Detection Paradigm That Can Detect Any Change You Want

TL;DR

This work reframes change detection (CD) by arguing that semantic information should be the first imaging factor, introducing SeFi-CD and the AUWCD framework to enable prompt-guided, zero-retraining detection of arbitrary CRoIs. It leverages a Semantic Align Module built on vision-language models and a CRoI Segment Module using foundational segmentation models to locate and compare semantically defined regions across temporal imagery, with a CD Module computing the final change maps. The approach demonstrates strong adaptability and competitive performance on public datasets, achieving an average improvement of and up to over current supervised baselines on the SECOND dataset, underscoring the practical impact of semantic-first CD. Overall, the results validate the paradigm shift toward semantic-first CD and highlight the value of prompt-based, multimodal guidance for scalable, flexible change detection in remote sensing.

Abstract

The existing change detection(CD) methods can be summarized as the visual-first change detection (ViFi-CD) paradigm, which first extracts change features from visual differences and then assigns them specific semantic information. However, CD is essentially dependent on change regions of interest (CRoIs), meaning that the CD results are directly determined by the semantics changes of interest, making its primary image factor semantic of interest rather than visual. The ViFi-CD paradigm can only assign specific semantics of interest to specific change features extracted from visual differences, leading to the inevitable omission of potential CRoIs and the inability to adapt to different CRoI CD tasks. In other words, changes in other CRoIs cannot be detected by the ViFi-CD method without retraining the model or significantly modifying the method. This paper introduces a new CD paradigm, the semantic-first CD (SeFi-CD) paradigm. The core idea of SeFi-CD is to first perceive the dynamic semantics of interest and then visually search for change features related to the semantics. Based on the SeFi-CD paradigm, we designed Anything You Want Change Detection (AUWCD). Experiments on public datasets demonstrate that the AUWCD outperforms the current state-of-the-art CD methods, achieving an average F1 score 5.01\% higher than that of these advanced supervised baselines on the SECOND dataset, with a maximum increase of 13.17\%. The proposed SeFi-CD offers a novel CD perspective and approach.
Paper Structure (50 sections, 19 equations, 10 figures, 11 tables)

This paper contains 50 sections, 19 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Comparison diagram of the ViFi-CD and SeFi-CD paradigms. The ViFi-CD paradigm first extracts specific change features visually and then assigns them specific semantic information, leading to methods that can only detect specific types of CRoIs. In contrast, the SeFi-CD paradigm first determines the semantic information of changes based on user prompts and then searches for corresponding and dynamic change features in the image, allowing it to detect CRoIs of any semantic type.
  • Figure 2: Overall AUWCD framework. First, the Semantic Align Module models the task of interest as text prompts, obtaining visual prompts from visual language models (VLMs) based on these text prompts to dynamically perceive the semantic information of changes of interest. Second, the CRoI Segment Module, based on the generated visual prompts, extracts the fine masks of CRoIs in images of different temporal phases through FSMs. Finally, the CD Module compares the segmentation results of images from different temporal phases to ultimately derive the CD results.
  • Figure 3: Illustration of the Semantic Align Module (CLIP surgery as an example). (a) shows the image encoding process. (b) shows the text encoding process. (c) shows similarity map generation. (d) shows point prompt generation based on similarity maps.
  • Figure 4: CRoI Segment Module (SAM as an example).
  • Figure 5: Multitemporal images that were selected from the BCDD dataset.
  • ...and 5 more figures