Zero-Shot Scene Change Detection
Kyusik Cho, Dong Yeop Kim, Euntai Kim
TL;DR
The paper tackles Scene Change Detection without training data by reusing a pre-trained tracking model to compare a reference image with a query image, reframing SCD as a tracking problem. It introduces two training-free mechanisms—a content-adaptive threshold and a style bridging layer—to address content gaps and style variations, respectively, and extends the approach to video for enhanced temporal reasoning. Through experiments on ChangeSim, VL-CMU-CD, and PCD, the method demonstrates robust cross-domain performance and competitive results relative to trained baselines, without data-label costs. The work offers practical benefits for real-world deployment where style variation and labeling costs hinder traditional supervised SCD methods, and it provides a versatile framework for zero-shot SCD in both images and video.
Abstract
We present a novel, training-free approach to scene change detection. Our method leverages tracking models, which inherently perform change detection between consecutive frames of video by identifying common objects and detecting new or missing objects. Specifically, our method takes advantage of the change detection effect of the tracking model by inputting reference and query images instead of consecutive frames. Furthermore, we focus on the content gap and style gap between two input images in change detection, and address both issues by proposing adaptive content threshold and style bridging layers, respectively. Finally, we extend our approach to video, leveraging rich temporal information to enhance the performance of scene change detection. We compare our approach and baseline through various experiments. While existing train-based baseline tend to specialize only in the trained domain, our method shows consistent performance across various domains, proving the competitiveness of our approach.
