Table of Contents
Fetching ...

CoRegOVCD: Consistency-Regularized Open-Vocabulary Change Detection

Weidong Tang, Hanbin Sun, Zihan Li, Yikai Wang, Feifan Zhang

Abstract

Remote sensing change detection (CD) aims to identify where land-cover semantics change across time, but most existing methods still assume a fixed label space and therefore cannot answer arbitrary user-defined queries. Open-vocabulary change detection (OVCD) instead asks for the change mask of a queried concept. In the fully training-free setting, however, dense concept responses are difficult to compare directly across dates: appearance variation, weak cross-concept competition, and the spatial continuity of many land-cover categories often produce noisy, fragmented, and semantically unreliable change evidence. We propose Consistency-Regularized Open-Vocabulary Change Detection (CoRegOVCD), a training-free dense inference framework that reformulates concept-specific change as calibrated posterior discrepancy. Competitive Posterior Calibration (CPC) and the Semantic Posterior Delta (SPD) convert raw concept responses into competition-aware queried-concept posteriors and quantify their cross-temporal discrepancy, making semantic change evidence more comparable without explicit instance matching. Geometry-Token Consistency Gate (GeoGate) and Regional Consensus Discrepancy (RCD) further suppress unsupported responses and improve spatial coherence through geometry-aware structural verification and regional consensus. Across four benchmarks spanning building-oriented and multi-class settings, CoRegOVCD consistently improves over the strongest previous training-free baseline by 2.24 to 4.98 F1$_C$ points and reaches a six-class average of 47.50% F1$_C$ on SECOND.

CoRegOVCD: Consistency-Regularized Open-Vocabulary Change Detection

Abstract

Remote sensing change detection (CD) aims to identify where land-cover semantics change across time, but most existing methods still assume a fixed label space and therefore cannot answer arbitrary user-defined queries. Open-vocabulary change detection (OVCD) instead asks for the change mask of a queried concept. In the fully training-free setting, however, dense concept responses are difficult to compare directly across dates: appearance variation, weak cross-concept competition, and the spatial continuity of many land-cover categories often produce noisy, fragmented, and semantically unreliable change evidence. We propose Consistency-Regularized Open-Vocabulary Change Detection (CoRegOVCD), a training-free dense inference framework that reformulates concept-specific change as calibrated posterior discrepancy. Competitive Posterior Calibration (CPC) and the Semantic Posterior Delta (SPD) convert raw concept responses into competition-aware queried-concept posteriors and quantify their cross-temporal discrepancy, making semantic change evidence more comparable without explicit instance matching. Geometry-Token Consistency Gate (GeoGate) and Regional Consensus Discrepancy (RCD) further suppress unsupported responses and improve spatial coherence through geometry-aware structural verification and regional consensus. Across four benchmarks spanning building-oriented and multi-class settings, CoRegOVCD consistently improves over the strongest previous training-free baseline by 2.24 to 4.98 F1 points and reaches a six-class average of 47.50% F1 on SECOND.

Paper Structure

This paper contains 19 sections, 18 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Paradigm comparison between previous training-free OVCD methods and CoRegOVCD. (a) Previous methods typically rely on explicit mask or instance representations, semantic cues, and cross-temporal matching before producing the final change mask. (b) CoRegOVCD instead performs dense posterior-based change inference through dense score construction, CPC, SPD, GeoGate, RCD, and lightweight final mask inference.
  • Figure 2: Overview of CoRegOVCD. Given bi-temporal images and a queried concept, the framework first constructs dense concept confidence scores from prompt-conditioned Segment Anything Model 3 (SAM 3) outputs and calibrates them with Competitive Posterior Calibration (CPC). It then computes the Semantic Posterior Delta (SPD) as the semantic change signal. For structural verification, Geometry-Token Consistency Gate (GeoGate) employs a Geometric Encoder instantiated with Depth Anything 3 (DA3) to extract geometry tokens and derive the gate map $G$. Finally, Regional Consensus Discrepancy (RCD) fuses SPD and the gate map $G$ and imposes SLIC-based regional consensus, after which a lightweight final mask inference stage converts the resulting score map into the final change mask.
  • Figure 3: Open-vocabulary query substitution on SECOND. Each panel reports F1$_C$ (%) for semantically related query words of one class. Bold bars denote the default query, and the hatched bar denotes a prompt set.
  • Figure 4: Efficiency and accuracy trade-off among training-free OVCD methods. (a) Latency per image pair and peak memory on an NVIDIA A800-SXM4-80GB device, measured after a one-pair warm-up. (b) Accuracy--speed trade-off in terms of throughput and the corresponding DSIFN F1$_C$ values used in the efficiency comparison.
  • Figure 5: Qualitative comparison on SECOND with DynamicEarth (MCI/IMC), AdaptOVCD, and CoRegOVCD across six semantic categories. CoRegOVCD yields more complete masks with fewer unsupported responses.
  • ...and 1 more figures