Table of Contents
Fetching ...

AdaptOVCD: Training-Free Open-Vocabulary Remote Sensing Change Detection via Adaptive Information Fusion

Mingyu Dou, Shi Qiu, Ming Hu, Yifan Chen, Huping Ye, Xiaohan Liao, Zhe Sun

TL;DR

AdaptOVCD tackles open-world remote sensing change detection without annotations by integrating three pre-trained models through a dual-dimensional, multi-level fusion framework. It decomposes OVCD into instance segmentation, feature comparison, and semantic identification, and introduces ARA, ACT, and ACF to align data, calibrate decisions, and filter confidences, respectively. The framework achieves strong zero-shot performance across nine scenarios and maintains about $84.89\%$ of fully supervised upper bounds in cross-dataset tests, demonstrating robust generalization. This training-free approach offers a practical, prompt-driven solution for detecting arbitrary land-change categories in overhead imagery and paves the way for scalable OVCD data collection through zero-shot inference.

Abstract

Remote sensing change detection plays a pivotal role in domains such as environmental monitoring, urban planning, and disaster assessment. However, existing methods typically rely on predefined categories and large-scale pixel-level annotations, which limit their generalization and applicability in open-world scenarios. To address these limitations, this paper proposes AdaptOVCD, a training-free Open-Vocabulary Change Detection (OVCD) architecture based on dual-dimensional multi-level information fusion. The framework integrates multi-level information fusion across data, feature, and decision levels vertically while incorporating targeted adaptive designs horizontally, achieving deep synergy among heterogeneous pre-trained models to effectively mitigate error propagation. Specifically, (1) at the data level, Adaptive Radiometric Alignment (ARA) fuses radiometric statistics with original texture features and synergizes with SAM-HQ to achieve radiometrically consistent segmentation; (2) at the feature level, Adaptive Change Thresholding (ACT) combines global difference distributions with edge structure priors and leverages DINOv3 to achieve robust change detection; (3) at the decision level, Adaptive Confidence Filtering (ACF) integrates semantic confidence with spatial constraints and collaborates with DGTRS-CLIP to achieve high-confidence semantic identification. Comprehensive evaluations across nine scenarios demonstrate that AdaptOVCD detects arbitrary category changes in a zero-shot manner, significantly outperforming existing training-free methods. Meanwhile, it achieves 84.89\% of the fully-supervised performance upper bound in cross-dataset evaluations and exhibits superior generalization capabilities. The code is available at https://github.com/Dmygithub/AdaptOVCD.

AdaptOVCD: Training-Free Open-Vocabulary Remote Sensing Change Detection via Adaptive Information Fusion

TL;DR

AdaptOVCD tackles open-world remote sensing change detection without annotations by integrating three pre-trained models through a dual-dimensional, multi-level fusion framework. It decomposes OVCD into instance segmentation, feature comparison, and semantic identification, and introduces ARA, ACT, and ACF to align data, calibrate decisions, and filter confidences, respectively. The framework achieves strong zero-shot performance across nine scenarios and maintains about of fully supervised upper bounds in cross-dataset tests, demonstrating robust generalization. This training-free approach offers a practical, prompt-driven solution for detecting arbitrary land-change categories in overhead imagery and paves the way for scalable OVCD data collection through zero-shot inference.

Abstract

Remote sensing change detection plays a pivotal role in domains such as environmental monitoring, urban planning, and disaster assessment. However, existing methods typically rely on predefined categories and large-scale pixel-level annotations, which limit their generalization and applicability in open-world scenarios. To address these limitations, this paper proposes AdaptOVCD, a training-free Open-Vocabulary Change Detection (OVCD) architecture based on dual-dimensional multi-level information fusion. The framework integrates multi-level information fusion across data, feature, and decision levels vertically while incorporating targeted adaptive designs horizontally, achieving deep synergy among heterogeneous pre-trained models to effectively mitigate error propagation. Specifically, (1) at the data level, Adaptive Radiometric Alignment (ARA) fuses radiometric statistics with original texture features and synergizes with SAM-HQ to achieve radiometrically consistent segmentation; (2) at the feature level, Adaptive Change Thresholding (ACT) combines global difference distributions with edge structure priors and leverages DINOv3 to achieve robust change detection; (3) at the decision level, Adaptive Confidence Filtering (ACF) integrates semantic confidence with spatial constraints and collaborates with DGTRS-CLIP to achieve high-confidence semantic identification. Comprehensive evaluations across nine scenarios demonstrate that AdaptOVCD detects arbitrary category changes in a zero-shot manner, significantly outperforming existing training-free methods. Meanwhile, it achieves 84.89\% of the fully-supervised performance upper bound in cross-dataset evaluations and exhibits superior generalization capabilities. The code is available at https://github.com/Dmygithub/AdaptOVCD.
Paper Structure (23 sections, 10 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 10 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison of change detection paradigms. (a) Traditional supervised methods including BCD and SCD rely on large-scale pixel-level annotations and time-consuming training, restricting their applicability to predefined categories and suffering from limited cross-dataset generalization. (b) The proposed AdaptOVCD framework enables training-free zero-shot detection through natural language prompts, eliminating the need for annotated data while achieving robust generalization across diverse remote sensing scenarios.
  • Figure 2: Overview of the proposed AdaptOVCD framework for training-free open-vocabulary change detection. The pipeline implements dual-dimensional multi-level information fusion: vertically constructing a data-feature-decision cascade, and horizontally conducting targeted adaptive designs at each level. (a) Instance Segmentation: ARA performs radiometric alignment, followed by SAM-HQ generating class-agnostic masks $S_a$ and $S_b$ from bi-temporal images $I_a$ and $I_b$. (b) Feature Comparison: DINOv3 extracts dense semantic features and computes region-level representations $O_a$ and $O_b$ via mask pooling, while ACT determines adaptive thresholds to identify change candidates $C$. (c) Semantic Identification: DGTRS-CLIP performs open-vocabulary classification using text prototypes $\mathcal{T}$, and ACF applies confidence-based filtering to produce the final change mask $M$.
  • Figure 3: Qualitative results of AdaptOVCD on building change detection datasets including LEVIR-CD, WHU-CD, DSIFN, and the building category of SECOND. Each row displays a representative sample with varying building scales and densities. Columns from left to right: pre-change image $I_a$, post-change image $I_b$, ground truth annotation, and AdaptOVCD prediction. Color legend: Building change.
  • Figure 4: Open-vocabulary change detection results on the SECOND dataset across six semantic categories. Each row demonstrates detection capability for a specific land-cover change type driven by text prompts. Columns from left to right: pre-change image $I_a$, post-change image $I_b$, ground truth annotation, and AdaptOVCD prediction. Color legend: Building, Low Vegetation, Non-vegetated Ground, Playground, Tree, Water.
  • Figure 5: Intermediate process visualization illustrating the progressive filtering mechanism of AdaptOVCD on two representative LEVIR-CD samples. The top row shows a true positive case where building changes are successfully detected through the three-stage pipeline. The bottom row demonstrates effective false positive suppression, where spurious detections are progressively eliminated. Processing stages are color-coded: light orange for Instance Segmentation ($S_a$, $S_b$), light green for Feature Comparison (candidates $C$), and light purple for semantic identification with yellow DGTRS-CLIP and blue ACF. Symbols: ✓ accepted, $\times$ rejected by DGTRS-CLIP, blue regions filtered by ACF due to low confidence.