LEDetection: A Simple Framework for Semi-Supervised Few-Shot Object Detection

Phi Vu Tran

LEDetection: A Simple Framework for Semi-Supervised Few-Shot Object Detection

Phi Vu Tran

TL;DR

The paper tackles the challenge of semi-supervised few-shot object detection under simultaneous base and novel label scarcity. It introduces SoftER Teacher, an extension of Soft Teacher that uses entropy-based region-proposal consistency to exploit unlabeled data and strengthen both base and novel detection. The authors propose the Label-Efficient Detection framework and a two-stage training pipeline, demonstrating that SoftER Teacher outperforms strong supervised baselines with far fewer base labels and exhibits reduced base forgetting. They also present the LEDetection benchmark to quantify unlabeled data utility and reveal a link between semi-supervised detector strength and few-shot efficiency, suggesting that stronger SSOD models yield more label-efficient FSOD.

Abstract

Few-shot object detection (FSOD) is a challenging problem aimed at detecting novel concepts from few exemplars. Existing approaches to FSOD all assume abundant base labels to adapt to novel objects. This paper studies the new task of semi-supervised FSOD by considering a realistic scenario in which both base and novel labels are simultaneously scarce. We explore the utility of unlabeled data within our proposed label-efficient detection framework and discover its remarkable ability to boost semi-supervised FSOD by way of region proposals. Motivated by this finding, we introduce SoftER Teacher, a robust detector combining pseudo-labeling with consistency learning on region proposals, to harness unlabeled data for improved FSOD without relying on abundant labels. Rigorous experiments show that SoftER Teacher surpasses the novel performance of a strong supervised detector using only 10% of required base labels, without catastrophic forgetting observed in prior approaches. Our work also sheds light on a potential relationship between semi-supervised and few-shot detection suggesting that a stronger semi-supervised detector leads to a more effective few-shot detector.

LEDetection: A Simple Framework for Semi-Supervised Few-Shot Object Detection

TL;DR

Abstract

Paper Structure (45 sections, 6 equations, 11 figures, 14 tables)

This paper contains 45 sections, 6 equations, 11 figures, 14 tables.

INTRODUCTION
RELATED WORK
Semi-Supervised Detection
Few-Shot Detection
Semi-Supervised Few-Shot Detection
APPROACH
What Makes for Effective FSOD?
Discussion
Semi-Supervised Base Pre-Training
Soft Teacher
SoftER Teacher
Semi-Supervised Few-Shot Fine-Tuning
EXPERIMENTS
Datasets
Implementation Details
...and 30 more sections

Figures (11)

Figure 1: The evaluation of generalized FSOD is characterized by the trade-off between novel performance and base forgetting. We leverage unlabeled data to optimize for semi-supervised FSOD on both base $+$ novel classes (top right). Our approach significantly expands base class AP, $39.3 \rightarrow 44.4$, while incurring less than 9% in base degradation (vs. 19% for LVC) and also improving on novel detection (nAP). Our SoftER Teacher is the best model on the Overall AP metric, leading the next best Retentive R-CNN by $+2.0$ AP.
Figure 2: We present the Label-Efficient Detection framework to harness supplementary unlabeled data for generalized semi-supervised few-shot detection. At the core of the framework is our proposed SoftER Teacher with Entropy Regression for improved semi-supervised representation learning (upper right). Extensive comparative experiments show that SoftER Teacher is also a more label-efficient few-shot detector (lower right).
Figure 3: We analyze the effectiveness of the RPN as a function of base labels. (a) Unlabeled data provide a convincing boost in proposal quality, closing the gap between the Base and Full detectors, which should lead to better discovery of novel categories during fine-tuning. (b--c) In low-label regimes, unlabeled data can help produce diverse proposals (green boxes) on novel unseen objects {boat, bus, car, dog}, whereas the vanilla supervised FRCN-Base fails to capture comparable foreground objects with only one red box. Best viewed digitally.
Figure 4: Visualizations of student-teacher proposals with confidence scores $\ge 0.99$. As illustrated by the arrow, a pair of student-teacher proposals is related by a transformation matrix $M$, which is used to align proposals between student and teacher images for enforcing box classification similarity and localization consistency.
Figure 4: Ablation experiments quantifying the effectiveness of each component in our semi-supervised approach using 1% of COCO labels. The first row corresponds to the Soft Teacher baseline and the last row is our SoftER Teacher configuration.
...and 6 more figures

LEDetection: A Simple Framework for Semi-Supervised Few-Shot Object Detection

TL;DR

Abstract

LEDetection: A Simple Framework for Semi-Supervised Few-Shot Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (11)