Table of Contents
Fetching ...

Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection

Xinhao Zhong, Siyu Jiao, Yao Zhao, Yunchao Wei

TL;DR

OSSOD presents a practical challenge where unlabeled data contain both ID and OOD objects, risking misclassification. The paper introduces CFL-Detector, a collaborative framework that jointly optimizes a feature-space contrastive loss $\mathcal{L}_{fc}$ and a logits-space uncertainty loss $\mathcal{L}_{uc}$ within a two-stage teacher–student training scheme, leveraging a memory pool of embeddings and uncertainty weighting to separate ID from OOD. This approach yields a robust detection system that marks OOD as 'unknown' and preserves ID accuracy, demonstrated across COCO Open-CLS, COCO Open-SUP, and VOC-COCO benchmarks with state-of-the-art results. While effective, the method may incur a slight decrease in ID performance due to the emphasis on OOD separation, indicating potential trade-offs to balance in future work.

Abstract

Current Semi-Supervised Object Detection (SSOD) methods enhance detector performance by leveraging large amounts of unlabeled data, assuming that both labeled and unlabeled data share the same label space. However, in open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes. Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes. To alleviate this issue, we propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector). Specifically, we introduce a feature-level clustering method using contrastive loss to clarify vector boundaries in the feature space and highlight class differences. Additionally, by optimizing the logits-level uncertainty classification loss, the model enhances its ability to effectively distinguish between ID and OOD classes. Extensive experiments demonstrate that our method achieves state-of-the-art performance compared to existing methods.

Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection

TL;DR

OSSOD presents a practical challenge where unlabeled data contain both ID and OOD objects, risking misclassification. The paper introduces CFL-Detector, a collaborative framework that jointly optimizes a feature-space contrastive loss and a logits-space uncertainty loss within a two-stage teacher–student training scheme, leveraging a memory pool of embeddings and uncertainty weighting to separate ID from OOD. This approach yields a robust detection system that marks OOD as 'unknown' and preserves ID accuracy, demonstrated across COCO Open-CLS, COCO Open-SUP, and VOC-COCO benchmarks with state-of-the-art results. While effective, the method may incur a slight decrease in ID performance due to the emphasis on OOD separation, indicating potential trade-offs to balance in future work.

Abstract

Current Semi-Supervised Object Detection (SSOD) methods enhance detector performance by leveraging large amounts of unlabeled data, assuming that both labeled and unlabeled data share the same label space. However, in open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes. Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes. To alleviate this issue, we propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector). Specifically, we introduce a feature-level clustering method using contrastive loss to clarify vector boundaries in the feature space and highlight class differences. Additionally, by optimizing the logits-level uncertainty classification loss, the model enhances its ability to effectively distinguish between ID and OOD classes. Extensive experiments demonstrate that our method achieves state-of-the-art performance compared to existing methods.

Paper Structure

This paper contains 19 sections, 7 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of our CFL-Detector. Training Stage 1: We begin with fully supervised pre-training on labeled data, where the pre-trained model serves as the teacher net. Training Stage 2: The teacher net's parameters are frozen, and the student net is trained on unlabeled data using pseudo-labels from the teacher net. To ensure reliable pseudo-labels, the teacher net is updated with an exponential moving average (EMA) of the student net's parameters. Inference: our CFL-Detector effectively detects both ID classes ("person" and "sheep") and OOD class ("dog"). This is an image that shows an overview of this paper.
  • Figure 2: Details of the feature contrastive loss ($\mathcal{L}_{fc}$) and uncertainty classification loss ($\mathcal{L}_{uc}$). $\mathbf{Left}$: The $\mathcal{L}_{fc}$ is computed over the feature pool, aggregating ID classes while separating OOD class. Thresholds continuously update the feature pool to maintain both diversity and accuracy. $\mathbf{Right}$: The $\mathcal{L}_{uc}$ modifies the cross-entropy calculation to assign higher confidence to the OOD, boosting the classifier’s ability to distinguish OOD.
  • Figure 3: (a) Visualization of pseudo-labels from UT (Top) and Ours (Bottom). Our method reduces OOD interference and enhances bounding box quality. (b) Qualitative Comparisons between UT (Top) and Ours (Bottom). UT misclassifies OOD class as ID with high confidence, whereas our method accurately labels them as "unknown".