Table of Contents
Fetching ...

UN-DETR: Promoting Objectness Learning via Joint Supervision for Unknown Object Detection

Haomiao Liu, Hao Xu, Chuhuai Yue, Bo Ma

TL;DR

Unknown Object Detection (UOD) requires learning a class-agnostic objectness score to locate objects across known and unknown categories. The authors introduce UN-DETR, a Transformer-based detector with an Instance Presence Score (IPS) predictor that jointly leverages positional and categorical latent spaces, supported by a one-to-many assignment strategy and Unbiased Query Selection to improve object queries. IPS-guided post-processing and unsupervised pretraining with objectness priors further enhance the separation of known and unknown objects, yielding end-to-end improvements on UOD benchmarks. The approach achieves state-of-the-art results on COCO-OOD, COCO-Mixed, and VOC, while preserving standard known-object detection performance, demonstrating the viability of integrated objectness learning in open-world perception. Optimization combines $L_{UN-DETR} = \lambda_1 L_{IPS} + \lambda_2 L_{cls} + \lambda_3 L_{bbox}$ with $L_{IPS} = L_{IPS}^H + L_{IPS}^L$ and $P_o(e_i) = \alpha e_{bbox}^{o} + \beta e_{cls}^{o}$, where $L_{IPS}^H$ uses $\mathrm{GIoU}$ and a threshold $\tau$, and $L_{IPS}^L$ uses a constant $C$, enabling robust cross-category objectness learning.

Abstract

Unknown Object Detection (UOD) aims to identify objects of unseen categories, differing from the traditional detection paradigm limited by the closed-world assumption. A key component of UOD is learning a generalized representation, i.e. objectness for both known and unknown categories to distinguish and localize objects from the background in a class-agnostic manner. However, previous methods obtain supervision signals for learning objectness in isolation from either localization or classification information, leading to poor performance for UOD. To address this issue, we propose a transformer-based UOD framework, UN-DETR. Based on this, we craft Instance Presence Score (IPS) to represent the probability of an object's presence. For the purpose of information complementarity, IPS employs a strategy of joint supervised learning, integrating attributes representing general objectness from the positional and the categorical latent space as supervision signals. To enhance IPS learning, we introduce a one-to-many assignment strategy to incorporate more supervision. Then, we propose Unbiased Query Selection to provide premium initial query vectors for the decoder. Additionally, we propose an IPS-guided post-process strategy to filter redundant boxes and correct classification predictions for known and unknown objects. Finally, we pretrain the entire UN-DETR in an unsupervised manner, in order to obtain objectness prior. Our UN-DETR is comprehensively evaluated on multiple UOD and known detection benchmarks, demonstrating its effectiveness and achieving state-of-the-art performance.

UN-DETR: Promoting Objectness Learning via Joint Supervision for Unknown Object Detection

TL;DR

Unknown Object Detection (UOD) requires learning a class-agnostic objectness score to locate objects across known and unknown categories. The authors introduce UN-DETR, a Transformer-based detector with an Instance Presence Score (IPS) predictor that jointly leverages positional and categorical latent spaces, supported by a one-to-many assignment strategy and Unbiased Query Selection to improve object queries. IPS-guided post-processing and unsupervised pretraining with objectness priors further enhance the separation of known and unknown objects, yielding end-to-end improvements on UOD benchmarks. The approach achieves state-of-the-art results on COCO-OOD, COCO-Mixed, and VOC, while preserving standard known-object detection performance, demonstrating the viability of integrated objectness learning in open-world perception. Optimization combines with and , where uses and a threshold , and uses a constant , enabling robust cross-category objectness learning.

Abstract

Unknown Object Detection (UOD) aims to identify objects of unseen categories, differing from the traditional detection paradigm limited by the closed-world assumption. A key component of UOD is learning a generalized representation, i.e. objectness for both known and unknown categories to distinguish and localize objects from the background in a class-agnostic manner. However, previous methods obtain supervision signals for learning objectness in isolation from either localization or classification information, leading to poor performance for UOD. To address this issue, we propose a transformer-based UOD framework, UN-DETR. Based on this, we craft Instance Presence Score (IPS) to represent the probability of an object's presence. For the purpose of information complementarity, IPS employs a strategy of joint supervised learning, integrating attributes representing general objectness from the positional and the categorical latent space as supervision signals. To enhance IPS learning, we introduce a one-to-many assignment strategy to incorporate more supervision. Then, we propose Unbiased Query Selection to provide premium initial query vectors for the decoder. Additionally, we propose an IPS-guided post-process strategy to filter redundant boxes and correct classification predictions for known and unknown objects. Finally, we pretrain the entire UN-DETR in an unsupervised manner, in order to obtain objectness prior. Our UN-DETR is comprehensively evaluated on multiple UOD and known detection benchmarks, demonstrating its effectiveness and achieving state-of-the-art performance.

Paper Structure

This paper contains 35 sections, 13 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Joint supervision for objectness learning
  • Figure 2: The overall architecture of UN-DETR
  • Figure 3: Visualizations of discriminability scores
  • Figure 4: Visualization of classification scores and IPS for encoder features
  • Figure 5: Example results on COCO-OOD (first two rows) and COCO-Mixed (last two rows) datasets. Detections are overlaid on known (yellow) and unknown (blue) objects.
  • ...and 6 more figures