Table of Contents
Fetching ...

Boosting Few-Shot Open-Set Object Detection via Prompt Learning and Robust Decision Boundary

Zhaowei Wu, Binyi Su, Qichuan Geng, Hua Zhang, Zhong Zhou

TL;DR

The paper tackles few-shot open-set object detection by leveraging vision-language prompts to harness textual information for unknown rejection. It introduces a three-component framework—Attribution-Gradient based Pseudo-unknown Mining (AGPM) to mine high-uncertainty proposals, Conditional Evidence Decoupling (CED) to decouple known and unknown evidence via Evidential Deep Learning, and Abnormal Distribution Calibration (ADC) to regularize local features and shape a robust unknown boundary—built atop RegionCLIP with a two-stage optimization. The method demonstrates state-of-the-art performance on unknown-object metrics across VOC10-5-5 and VOC-COCO, achieving notable gains in $R_U$ and $AR_U$ while preserving competitive known-class accuracy. By integrating prompt-based learning with evidential reasoning and local distribution calibration, it offers a practical and robust solution for open-world detection under scarce unknown data.

Abstract

Few-shot Open-set Object Detection (FOOD) poses a challenge in many open-world scenarios. It aims to train an open-set detector to detect known objects while rejecting unknowns with scarce training samples. Existing FOOD methods are subject to limited visual information, and often exhibit an ambiguous decision boundary between known and unknown classes. To address these limitations, we propose the first prompt-based few-shot open-set object detection framework, which exploits additional textual information and delves into constructing a robust decision boundary for unknown rejection. Specifically, as no available training data for unknown classes, we select pseudo-unknown samples with Attribution-Gradient based Pseudo-unknown Mining (AGPM), which leverages the discrepancy in attribution gradients to quantify uncertainty. Subsequently, we propose Conditional Evidence Decoupling (CED) to decouple and extract distinct knowledge from selected pseudo-unknown samples by eliminating opposing evidence. This optimization process can enhance the discrimination between known and unknown classes. To further regularize the model and form a robust decision boundary for unknown rejection, we introduce Abnormal Distribution Calibration (ADC) to calibrate the output probability distribution of local abnormal features in pseudo-unknown samples. Our method achieves superior performance over previous state-of-the-art approaches, improving the average recall of unknown class by 7.24% across all shots in VOC10-5-5 dataset settings and 1.38% in VOC-COCO dataset settings. Our source code is available at https://gitee.com/VR_NAVE/ced-food.

Boosting Few-Shot Open-Set Object Detection via Prompt Learning and Robust Decision Boundary

TL;DR

The paper tackles few-shot open-set object detection by leveraging vision-language prompts to harness textual information for unknown rejection. It introduces a three-component framework—Attribution-Gradient based Pseudo-unknown Mining (AGPM) to mine high-uncertainty proposals, Conditional Evidence Decoupling (CED) to decouple known and unknown evidence via Evidential Deep Learning, and Abnormal Distribution Calibration (ADC) to regularize local features and shape a robust unknown boundary—built atop RegionCLIP with a two-stage optimization. The method demonstrates state-of-the-art performance on unknown-object metrics across VOC10-5-5 and VOC-COCO, achieving notable gains in and while preserving competitive known-class accuracy. By integrating prompt-based learning with evidential reasoning and local distribution calibration, it offers a practical and robust solution for open-world detection under scarce unknown data.

Abstract

Few-shot Open-set Object Detection (FOOD) poses a challenge in many open-world scenarios. It aims to train an open-set detector to detect known objects while rejecting unknowns with scarce training samples. Existing FOOD methods are subject to limited visual information, and often exhibit an ambiguous decision boundary between known and unknown classes. To address these limitations, we propose the first prompt-based few-shot open-set object detection framework, which exploits additional textual information and delves into constructing a robust decision boundary for unknown rejection. Specifically, as no available training data for unknown classes, we select pseudo-unknown samples with Attribution-Gradient based Pseudo-unknown Mining (AGPM), which leverages the discrepancy in attribution gradients to quantify uncertainty. Subsequently, we propose Conditional Evidence Decoupling (CED) to decouple and extract distinct knowledge from selected pseudo-unknown samples by eliminating opposing evidence. This optimization process can enhance the discrimination between known and unknown classes. To further regularize the model and form a robust decision boundary for unknown rejection, we introduce Abnormal Distribution Calibration (ADC) to calibrate the output probability distribution of local abnormal features in pseudo-unknown samples. Our method achieves superior performance over previous state-of-the-art approaches, improving the average recall of unknown class by 7.24% across all shots in VOC10-5-5 dataset settings and 1.38% in VOC-COCO dataset settings. Our source code is available at https://gitee.com/VR_NAVE/ced-food.

Paper Structure

This paper contains 18 sections, 16 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: The challenge of unknown classes in the image-text joint space and our solution. (a) There are still numerous unknown objects beyond the predefined vocabulary in real-world scenarios. (b) Our intuition is that region proposals with high uncertainty (yellow border) consist of features from both known and unknown classes. (c) Our method decouples and learns distinct information from pseudo-unknown samples to construct a discriminative decision boundary.
  • Figure 2: The detector misidentifies the zebra as horses (left). The detector misses the Czech hedgehog (middle). The detector successfully detects the known and rejects the unknown (right).
  • Figure 3: The overview architecture of our method, which is a two-stage detector with (a) Attribution-Gradient based Pseudo-unknown Mining, (b) Conditional Evidence Decoupling For Unknown Optimization, (c) Abnormal Distribution Calibration For Robust Decision Boundary.
  • Figure 4: Distribution of global aggregation attribution gradients across known, background, and unknown classes. Proposals are sampled from 500 randomly selected images in the VOC10-5-5 base training/testing sets and the VOC-COCO testing set, excluding the VOC-COCO training set as it contains only base class labels.
  • Figure 5: Scatter plots of known, background, and unknown classes on the VOC07+12trainval dataset. Each point represents a local feature $\mathbf{Z}_{xy}$ from intermediate output $\mathbf{Z}$. Proposals of unknown classes show twice as many gradient outliers (threshold $>$ 0.0002) as known and background classes, with 100 selected proposals per plot.
  • ...and 6 more figures