Table of Contents
Fetching ...

Semi-supervised Open-World Object Detection

Sahal Shaji Mullappilly, Abhishek Singh Gehlot, Rao Muhammad Anwer, Fahad Shahbaz Khan, Hisham Cholakkal

TL;DR

SS-OWOD introduces a semi-supervised open-world object detection framework that reduces the need for exhaustive human annotations during incremental learning. The core method, SS-OWFormer, uses feature-aligned transformer object queries and an object-query-guided pseudo-labeling scheme to leverage unlabeled data for both known and unknown classes, extending to satellite imagery with oriented boxes. The approach yields strong performance with partial labeling across natural and remote sensing datasets (MS COCO, PASCAL VOC, Objects365, DOTA) and novel satellite OWOD splits, outperforming prior OWOD methods especially in unknown recall while maintaining known-class accuracy. By combining a detached prior model, a mapping network, and cross-view feature alignment, the paper demonstrates practical gains for realistic deployment scenarios where full annotations are infeasible.

Abstract

Conventional open-world object detection (OWOD) problem setting first distinguishes known and unknown classes and then later incrementally learns the unknown objects when introduced with labels in the subsequent tasks. However, the current OWOD formulation heavily relies on the external human oracle for knowledge input during the incremental learning stages. Such reliance on run-time makes this formulation less realistic in a real-world deployment. To address this, we introduce a more realistic formulation, named semi-supervised open-world detection (SS-OWOD), that reduces the annotation cost by casting the incremental learning stages of OWOD in a semi-supervised manner. We demonstrate that the performance of the state-of-the-art OWOD detector dramatically deteriorates in the proposed SS-OWOD setting. Therefore, we introduce a novel SS-OWOD detector, named SS-OWFormer, that utilizes a feature-alignment scheme to better align the object query representations between the original and augmented images to leverage the large unlabeled and few labeled data. We further introduce a pseudo-labeling scheme for unknown detection that exploits the inherent capability of decoder object queries to capture object-specific information. We demonstrate the effectiveness of our SS-OWOD problem setting and approach for remote sensing object detection, proposing carefully curated splits and baseline performance evaluations. Our experiments on 4 datasets including MS COCO, PASCAL, Objects365 and DOTA demonstrate the effectiveness of our approach. Our source code, models and splits are available here - https://github.com/sahalshajim/SS-OWFormer

Semi-supervised Open-World Object Detection

TL;DR

SS-OWOD introduces a semi-supervised open-world object detection framework that reduces the need for exhaustive human annotations during incremental learning. The core method, SS-OWFormer, uses feature-aligned transformer object queries and an object-query-guided pseudo-labeling scheme to leverage unlabeled data for both known and unknown classes, extending to satellite imagery with oriented boxes. The approach yields strong performance with partial labeling across natural and remote sensing datasets (MS COCO, PASCAL VOC, Objects365, DOTA) and novel satellite OWOD splits, outperforming prior OWOD methods especially in unknown recall while maintaining known-class accuracy. By combining a detached prior model, a mapping network, and cross-view feature alignment, the paper demonstrates practical gains for realistic deployment scenarios where full annotations are infeasible.

Abstract

Conventional open-world object detection (OWOD) problem setting first distinguishes known and unknown classes and then later incrementally learns the unknown objects when introduced with labels in the subsequent tasks. However, the current OWOD formulation heavily relies on the external human oracle for knowledge input during the incremental learning stages. Such reliance on run-time makes this formulation less realistic in a real-world deployment. To address this, we introduce a more realistic formulation, named semi-supervised open-world detection (SS-OWOD), that reduces the annotation cost by casting the incremental learning stages of OWOD in a semi-supervised manner. We demonstrate that the performance of the state-of-the-art OWOD detector dramatically deteriorates in the proposed SS-OWOD setting. Therefore, we introduce a novel SS-OWOD detector, named SS-OWFormer, that utilizes a feature-alignment scheme to better align the object query representations between the original and augmented images to leverage the large unlabeled and few labeled data. We further introduce a pseudo-labeling scheme for unknown detection that exploits the inherent capability of decoder object queries to capture object-specific information. We demonstrate the effectiveness of our SS-OWOD problem setting and approach for remote sensing object detection, proposing carefully curated splits and baseline performance evaluations. Our experiments on 4 datasets including MS COCO, PASCAL, Objects365 and DOTA demonstrate the effectiveness of our approach. Our source code, models and splits are available here - https://github.com/sahalshajim/SS-OWFormer
Paper Structure (23 sections, 3 equations, 10 figures, 12 tables)

This paper contains 23 sections, 3 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Comparison of our SS-OWOD with other closely related object detection problem settings.
  • Figure 2: Comparison of our object query guided pseudo-labeling with feature averaging used in OW-DETR baseline. The baseline framework performs a channel averaging over single-scale features from the backbone, spatially crops them at the predicted bounding box positions, and selects the top-k to obtain pseudo-labels. In contrast, our approach strives to leverage object-specific information from multi-scale encoder features and decoder object queries. We modulate the decoder object queries with multi-scale encoder feature maps and perform multi-scale box pooling at predicted box locations to obtain objectness scores and select the top-k bounding box proposals as pseudo labels.
  • Figure 3: Overall architecture of our Semi-Supervised Open-World object detection Transformer (SS-OWFormer) framework. It comprises a backbone network, transformer-based deformable encoder-decoder, object query-guided pseudo-labeling, box prediction head, novelty classification, and objectness branches. The focus of our design is: (i) the introduction of a object query-guided pseudo-labeling (orange box at bottom row) that captures information from both transformer encoder and decoder for pseudo-labeling unknown objects. Object queries from the decoder are modulated with the multi-scale encoder features to obtain multi-scale spatial maps which are pooled at predicted box locations to obtain confidence scores for the unknown pseudo-labeling. (ii) The introduction of a novel semi-supervised learning pipeline ($\rightarrow$) for leveraging unlabelled data during incremental learning of a new set of object classes. In our semi-supervised incremental learning setting, the SS-OWFormer (current model) is trained along with its detached (frozen) copy (blue box on top row) together with a mapping network ($\mathcal{G}$). The mapping network ($\mathcal{G}$) projects the object queries from the current network to the detached network. Moreover, we use original and augmented images for the alignment of object query embeddings ($z$).
  • Figure 4: Qualitative results showing the detection performance on MS COCO examples. From the top row, the unknown classes are learned to be marked as a known category in the subsequent tasks as shown in the bottom row.
  • Figure 5: Qualitative results on satellite images with oriented bounding boxes. Oriented bounding boxes in blue depict unknown detections on the categories of roundabout, soccer field, and storage tanks in the images respectively. While other colors mark known categories of small-vehicle, swimming pool, and ship.
  • ...and 5 more figures