Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection
Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura, Asako Kanezaki
TL;DR
This work tackles the ill-posedness of detecting small, semantically nondistinctive object changes in ground-view scenes by introducing Degree of Ill-posedness (DoI) for GVCD and a zero-shot DoI estimation framework. It combines a base change detector with an object-search pipeline that leverages large multimodal models (including SAM, Grounding DINO, LLaVA with Alpha CLIP) to generate open-vocabulary object masks and linguistic labels, then integrates these with a DoI-based decision rule to refine changes. The key contribution is a novel, training-free (zero-shot) DoI estimator that improves state-of-the-art change detectors across diverse real-world datasets, particularly in cluttered environments, while highlighting limitations in regions where the baseline detector already underperforms. The results demonstrate a practical path toward active vision: using DoI to trigger targeted inspections and plan next-best-views, potentially enhancing robotic navigation and object-tracking capabilities in indoor environments.
Abstract
In everyday indoor navigation, robots often needto detect non-distinctive small-change objects (e.g., stationery,lost items, and junk, etc.) to maintain domain knowledge. Thisis most relevant to ground-view change detection (GVCD), a recently emerging research area in the field of computer vision.However, these existing techniques rely on high-quality class-specific object priors to regularize a change detector modelthat cannot be applied to semantically nondistinctive smallobjects. To address ill-posedness, in this study, we explorethe concept of degree-of-ill-posedness (DoI) from the newperspective of GVCD, aiming to improve both passive and activevision. This novel DoI problem is highly domain-dependent,and manually collecting fine-grained annotated training datais expensive. To regularize this problem, we apply the conceptof self-supervised learning to achieve efficient DoI estimationscheme and investigate its generalization to diverse datasets.Specifically, we tackle the challenging issue of obtaining self-supervision cues for semantically non-distinctive unseen smallobjects and show that novel "oversegmentation cues" from openvocabulary semantic segmentation can be effectively exploited.When applied to diverse real datasets, the proposed DoI modelcan boost state-of-the-art change detection models, and it showsstable and consistent improvements when evaluated on real-world datasets.
