Table of Contents
Fetching ...

Negative Prototypes Guided Contrastive Learning for WSOD

Yu Zhang, Chuang Zhu, Guoqing Yang, Siqi Chen

TL;DR

This work tackles weakly supervised object detection by addressing instance ambiguity and partial detection through a novel Negative Prototypes Guided Contrastive Learning (NPGC) framework. It introduces a global feature bank that stores both positive prototypes and negative prototypes, enabling a contrastive training regime that pulls same-class proposals together while pushing different-class and misclassified ones apart. A pseudo label sampling module leverages inter-image prototype information to mine missing instances and suppress overfitted, discriminative-part detections. On VOC07 and VOC12, NPGC achieves state-of-the-art mean average precision, demonstrating that negative prototypes and cross-image prototype relationships significantly enhance WSOD performance and generalization.

Abstract

Weakly Supervised Object Detection (WSOD) with only image-level annotation has recently attracted wide attention. Many existing methods ignore the inter-image relationship of instances which share similar characteristics while can certainly be determined not to belong to the same category. Therefore, in order to make full use of the weak label, we propose the Negative Prototypes Guided Contrastive learning (NPGC) architecture. Firstly, we define Negative Prototype as the proposal with the highest confidence score misclassified for the category that does not appear in the label. Unlike other methods that only utilize category positive feature, we construct an online updated global feature bank to store both positive prototypes and negative prototypes. Meanwhile, we propose a pseudo label sampling module to mine reliable instances and discard the easily misclassified instances based on the feature similarity with corresponding prototypes in global feature bank. Finally, we follow the contrastive learning paradigm to optimize the proposal's feature representation by attracting same class samples closer and pushing different class samples away in the embedding space. Extensive experiments have been conducted on VOC07, VOC12 datasets, which shows that our proposed method achieves the state-of-the-art performance.

Negative Prototypes Guided Contrastive Learning for WSOD

TL;DR

This work tackles weakly supervised object detection by addressing instance ambiguity and partial detection through a novel Negative Prototypes Guided Contrastive Learning (NPGC) framework. It introduces a global feature bank that stores both positive prototypes and negative prototypes, enabling a contrastive training regime that pulls same-class proposals together while pushing different-class and misclassified ones apart. A pseudo label sampling module leverages inter-image prototype information to mine missing instances and suppress overfitted, discriminative-part detections. On VOC07 and VOC12, NPGC achieves state-of-the-art mean average precision, demonstrating that negative prototypes and cross-image prototype relationships significantly enhance WSOD performance and generalization.

Abstract

Weakly Supervised Object Detection (WSOD) with only image-level annotation has recently attracted wide attention. Many existing methods ignore the inter-image relationship of instances which share similar characteristics while can certainly be determined not to belong to the same category. Therefore, in order to make full use of the weak label, we propose the Negative Prototypes Guided Contrastive learning (NPGC) architecture. Firstly, we define Negative Prototype as the proposal with the highest confidence score misclassified for the category that does not appear in the label. Unlike other methods that only utilize category positive feature, we construct an online updated global feature bank to store both positive prototypes and negative prototypes. Meanwhile, we propose a pseudo label sampling module to mine reliable instances and discard the easily misclassified instances based on the feature similarity with corresponding prototypes in global feature bank. Finally, we follow the contrastive learning paradigm to optimize the proposal's feature representation by attracting same class samples closer and pushing different class samples away in the embedding space. Extensive experiments have been conducted on VOC07, VOC12 datasets, which shows that our proposed method achieves the state-of-the-art performance.

Paper Structure

This paper contains 23 sections, 9 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration for negative prototypes. The green box in each of the three top-right images refers to the ground truth bounding box of the category "Horse", "Dog" and "Sheep", respectively. The yellow boxes refer to the misclassified proposals for the category "Cow". It is clear that "Cow" does not appear in any of the three images, while there are still proposals mistakenly detected as "Cow". We consider such proposals as negative prototypes for the category "Cow". We then extract the feature representations of these proposals and store them in the negative prototypes bank.
  • Figure 2: Comparison of classic contrastive learning (left) and our contrastive learning (right). We proposed the concept of Negative Prototypes (proposal mis-classified for category which has similar characteristics to current category while can certainly be determined not to belong to) and construct a global feature bank to store both positive prototype and negative prototype.
  • Figure 3: Overall architecture of the proposed method. NPGC consists of four major components: Feature extractor, MIL branch, Contrastive branch, and Online instance refine branch. We constructed a global feature bank to store both positive prototypes and negative prototypes, which utilized contrastive learning to pull close the samples from the positive pair and to push apart the samples from the negative pair. We employ a pseudo label sampling module to mine the missing instances and punish overfitted instances.
  • Figure 4: Qualitative results on VOC 2007 test set. The left columns show the results from OICR whereas the right columns show the results from our method.
  • Figure 5: More detection results on VOC 2007 test set. Boxes in light green represent ground-truth boxes, and boxes in other colors represent the predicted bounding boxes and the confidence scores.