Table of Contents
Fetching ...

Towards Generalized Few-Shot Open-Set Object Detection

Binyi Su, Hua Zhang, Jingzhi Li, Zhong Zhou

TL;DR

This work defines generalized few-shot open-set object detection (G-FOOD) and addresses the critical challenge of avoiding overfitting to scarce known-class data while reliably rejecting unknown objects. It introduces FOOD, a Faster R-CNN-based detector augmented with a Class Weight Sparsification Classifier (CWSC) and an Unknown Decoupling Learner (UDL), enabling threshold-free and generation-free unknown rejection without requiring pseudo-unknown samples. The approach achieves consistent improvements in unknown-object detection across VOC10-5-5, VOC-COCO, and LVIS benchmarks, while maintaining or enhancing closed-set performance for base and novel classes. These contributions establish a practical framework for robust open-set detection in few-shot and long-tail settings, with potential impact on safety-critical applications like autonomous systems and medical imaging; future work includes integrating prompt-learning and IoU-aware unknown optimization.

Abstract

Open-set object detection (OSOD) aims to detect the known categories and reject unknown objects in a dynamic world, which has achieved significant attention. However, previous approaches only consider this problem in data-abundant conditions, while neglecting the few-shot scenes. In this paper, we seek a solution for the generalized few-shot open-set object detection (G-FOOD), which aims to avoid detecting unknown classes as known classes with a high confidence score while maintaining the performance of few-shot detection. The main challenge for this task is that few training samples induce the model to overfit on the known classes, resulting in a poor open-set performance. We propose a new G-FOOD algorithm to tackle this issue, named \underline{F}ew-sh\underline{O}t \underline{O}pen-set \underline{D}etector (FOOD), which contains a novel class weight sparsification classifier (CWSC) and a novel unknown decoupling learner (UDL). To prevent over-fitting, CWSC randomly sparses parts of the normalized weights for the logit prediction of all classes, and then decreases the co-adaptability between the class and its neighbors. Alongside, UDL decouples training the unknown class and enables the model to form a compact unknown decision boundary. Thus, the unknown objects can be identified with a confidence probability without any threshold, prototype, or generation. We compare our method with several state-of-the-art OSOD methods in few-shot scenes and observe that our method improves the F-score of unknown classes by 4.80\%-9.08\% across all shots in VOC-COCO dataset settings \footnote[1]{The source code is available at \url{https://github.com/binyisu/food}}.

Towards Generalized Few-Shot Open-Set Object Detection

TL;DR

This work defines generalized few-shot open-set object detection (G-FOOD) and addresses the critical challenge of avoiding overfitting to scarce known-class data while reliably rejecting unknown objects. It introduces FOOD, a Faster R-CNN-based detector augmented with a Class Weight Sparsification Classifier (CWSC) and an Unknown Decoupling Learner (UDL), enabling threshold-free and generation-free unknown rejection without requiring pseudo-unknown samples. The approach achieves consistent improvements in unknown-object detection across VOC10-5-5, VOC-COCO, and LVIS benchmarks, while maintaining or enhancing closed-set performance for base and novel classes. These contributions establish a practical framework for robust open-set detection in few-shot and long-tail settings, with potential impact on safety-critical applications like autonomous systems and medical imaging; future work includes integrating prompt-learning and IoU-aware unknown optimization.

Abstract

Open-set object detection (OSOD) aims to detect the known categories and reject unknown objects in a dynamic world, which has achieved significant attention. However, previous approaches only consider this problem in data-abundant conditions, while neglecting the few-shot scenes. In this paper, we seek a solution for the generalized few-shot open-set object detection (G-FOOD), which aims to avoid detecting unknown classes as known classes with a high confidence score while maintaining the performance of few-shot detection. The main challenge for this task is that few training samples induce the model to overfit on the known classes, resulting in a poor open-set performance. We propose a new G-FOOD algorithm to tackle this issue, named \underline{F}ew-sh\underline{O}t \underline{O}pen-set \underline{D}etector (FOOD), which contains a novel class weight sparsification classifier (CWSC) and a novel unknown decoupling learner (UDL). To prevent over-fitting, CWSC randomly sparses parts of the normalized weights for the logit prediction of all classes, and then decreases the co-adaptability between the class and its neighbors. Alongside, UDL decouples training the unknown class and enables the model to form a compact unknown decision boundary. Thus, the unknown objects can be identified with a confidence probability without any threshold, prototype, or generation. We compare our method with several state-of-the-art OSOD methods in few-shot scenes and observe that our method improves the F-score of unknown classes by 4.80\%-9.08\% across all shots in VOC-COCO dataset settings \footnote[1]{The source code is available at \url{https://github.com/binyisu/food}}.
Paper Structure (33 sections, 23 equations, 5 figures, 10 tables)

This paper contains 33 sections, 23 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: The visualization of different tasks: closed-set object detection (CSOD), few-shot object detection (FSOD), open-set object detection (OSOD), and generalized few-shot open-set object detection (G-FOOD). In CSOD and FSOD tasks, unknown objects are ignored or incorrectly classified into the set of known classes. The OSOD task can reject unknown class, but it usually requires data-abundant known classes for training FSOSR2021. Our G-FOOD task can identify the data-abundant and data-hungry known objects while rejecting unknown objects based on limited training data, which provides a better open-scene understanding paradigm.
  • Figure 2: The framework of our FOOD for generalized few-shot open-set object detection. Compared to the standard Faster R-CNN, FOOD plugs a novel class weight sparsification classifier (CWSC) and a novel unknown decoupling learner (UDL). We sparsity the normalized weights for the class logit prediction and simultaneously optimize a binary sigmoid classifier and a multiply softmax classifier in the classification head. Our method is characterized by no pseudo-unknown sample generation, prototype-free, and threshold-free to reject unknowns in few-shot scenes.
  • Figure 3: Effect of different sparsity probabilities $\widehat{p}$, scope factors $\delta_1=\delta_2$ and sampling ratios $N_{pos}:N_{neg}$ on the 10-shot VOC-COCO dataset setting.
  • Figure 4: The relationship between iteration and performance metrics ($mAP_B$, $mAP_N$, and $AP_U$) for our G-FOOD method. We are hard to select a proper stop fine-tuning iteration to balance the performance of base classes, novel classes, and the unknown class.
  • Figure 5: The visualization results (10-shot VOC-COCO setting). We visualize the bounding boxes with a score larger than 0.1. Our FOOD can detect more unknown objects than other methods. Red box is the failure case, several giraffes (novel class) are misidentified as the unknown class.