Unseen Object Reasoning with Shared Appearance Cues
Paridhi Singh, Arun Kumar
TL;DR
The paper tackles open world recognition by representing objects as constellations of mid-level appearance cues derived from patch-level ViT features. A semantic prior over clusters, $S ∈ ℝ^{G×K}$, is learned from known classes and used to compute a test image semantic vector $P_{I^t}$ via $P_{I^t} = \frac{1}{M^2} \sum_m S(D_k^m)$, enabling inference of unseen objects' similarity to known categories and their superclasses. They demonstrate that a finite set of mid-level cues suffices to model both seen and unseen objects on CIFAR100 and ImageNet 64×64, performing superclass reasoning without full supervision. This approach provides robust open-world reasoning and has practical potential for real-world recognition tasks by enabling ongoing reasoning about novel objects without exhaustively labeling all categories.
Abstract
This paper introduces an innovative approach to open world recognition (OWR), where we leverage knowledge acquired from known objects to address the recognition of previously unseen objects. The traditional method of object modeling relies on supervised learning with strict closed-set assumptions, presupposing that objects encountered during inference are already known at the training phase. However, this assumption proves inadequate for real-world scenarios due to the impracticality of accounting for the immense diversity of objects. Our hypothesis posits that object appearances can be represented as collections of "shareable" mid-level features, arranged in constellations to form object instances. By adopting this framework, we can efficiently dissect and represent both known and unknown objects in terms of their appearance cues. Our paper introduces a straightforward yet elegant method for modeling novel or unseen objects, utilizing established appearance cues and accounting for inherent uncertainties. This representation not only enables the detection of out-of-distribution objects or novel categories among unseen objects but also facilitates a deeper level of reasoning, empowering the identification of the superclass to which an unknown instance belongs. This novel approach holds promise for advancing open world recognition in diverse applications.
