SceneDiff: A Benchmark and Method for Multiview Object Change Detection

Yuqun Wu; Chih-hao Lin; Henry Che; Aditi Tiwari; Chuhang Zou; Shenlong Wang; Derek Hoiem

SceneDiff: A Benchmark and Method for Multiview Object Change Detection

Yuqun Wu, Chih-hao Lin, Henry Che, Aditi Tiwari, Chuhang Zou, Shenlong Wang, Derek Hoiem

TL;DR

The paper introduces SceneDiff, a training-free framework for multiview object change detection, and SceneDiff Benchmark, the first dataset with dense instance-level annotations across diverse scenes and viewpoints. It leverages pretrained 3D reconstruction (pi^3), segmentation (SAM), and semantic features (DINOv3) to align temporal captures in 3D and detect changes via region-level scoring and cross-frame instance association. The approach yields large improvements over existing baselines on both multiview and two-view benchmarks and is demonstrated in a robotic tidying application. Limitations include ambiguity in cluttered scenes, sensitivity to geometry reconstruction, and a focus on object-level changes, with future work aiming to handle semantic state changes and deformable changes.

Abstract

We investigate the problem of identifying objects that have been added, removed, or moved between a pair of captures (images or videos) of the same scene at different times. Detecting such changes is important for many applications, such as robotic tidying or construction progress and safety monitoring. A major challenge is that varying viewpoints can cause objects to falsely appear changed. We introduce SceneDiff Benchmark, the first multiview change detection benchmark with object instance annotations, comprising 350 diverse video pairs with thousands of changed objects. We also introduce the SceneDiff method, a new training-free approach for multiview object change detection that leverages pretrained 3D, segmentation, and image encoding models to robustly predict across multiple benchmarks. Our method aligns the captures in 3D, extracts object regions, and compares spatial and semantic region features to detect changes. Experiments on multi-view and two-view benchmarks demonstrate that our method outperforms existing approaches by large margins (94% and 37.4% relative AP improvements). The benchmark and code will be publicly released.

SceneDiff: A Benchmark and Method for Multiview Object Change Detection

TL;DR

Abstract

SceneDiff: A Benchmark and Method for Multiview Object Change Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (24)