Table of Contents
Fetching ...

Virtual Category Learning: A Semi-Supervised Learning Method for Dense Prediction with Extremely Limited Labels

Changrui Chen, Jungong Han, Kurt Debattista

TL;DR

The paper tackles the challenge of label scarcity in dense vision tasks by proposing Virtual Category (VC) learning, which assigns a virtual category to confusing pseudo-labeled samples to enable safe optimization without correct ground-truth labels. It introduces a Potential Category (PC) set per sample and uses a VC-based loss (e.g., $L_{VC-CE}$, $L_{VC-MSE}$) with a virtual weight generated via a transformer or teacher feature to backpropagate meaningful gradients, thereby mitigating confirmation bias. The approach yields theoretical and practical benefits, providing an upper bound on inter-class information sharing and improving embedding space, with strong empirical gains on semantic segmentation (Pascal VOC, Cityscapes) and object detection (MS COCO, VOC) under extremely limited labels; it also demonstrates applicability to non-dense tasks like miniImageNet. Overall, VC learning offers a principled, generalizable method to exploit confusing samples in semi-supervised dense prediction, outperforming state-of-the-art baselines across tasks and label regimes.

Abstract

Due to the costliness of labelled data in real-world applications, semi-supervised learning, underpinned by pseudo labelling, is an appealing solution. However, handling confusing samples is nontrivial: discarding valuable confusing samples would compromise the model generalisation while using them for training would exacerbate the issue of confirmation bias caused by the resulting inevitable mislabelling. To solve this problem, this paper proposes to use confusing samples proactively without label correction. Specifically, a Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation even without a concrete label. This provides an upper bound for inter-class information sharing capacity, which eventually leads to a better embedding space. Extensive experiments on two mainstream dense prediction tasks -- semantic segmentation and object detection, demonstrate that the proposed VC learning significantly surpasses the state-of-the-art, especially when only very few labels are available. Our intriguing findings highlight the usage of VC learning in dense vision tasks.

Virtual Category Learning: A Semi-Supervised Learning Method for Dense Prediction with Extremely Limited Labels

TL;DR

The paper tackles the challenge of label scarcity in dense vision tasks by proposing Virtual Category (VC) learning, which assigns a virtual category to confusing pseudo-labeled samples to enable safe optimization without correct ground-truth labels. It introduces a Potential Category (PC) set per sample and uses a VC-based loss (e.g., , ) with a virtual weight generated via a transformer or teacher feature to backpropagate meaningful gradients, thereby mitigating confirmation bias. The approach yields theoretical and practical benefits, providing an upper bound on inter-class information sharing and improving embedding space, with strong empirical gains on semantic segmentation (Pascal VOC, Cityscapes) and object detection (MS COCO, VOC) under extremely limited labels; it also demonstrates applicability to non-dense tasks like miniImageNet. Overall, VC learning offers a principled, generalizable method to exploit confusing samples in semi-supervised dense prediction, outperforming state-of-the-art baselines across tasks and label regimes.

Abstract

Due to the costliness of labelled data in real-world applications, semi-supervised learning, underpinned by pseudo labelling, is an appealing solution. However, handling confusing samples is nontrivial: discarding valuable confusing samples would compromise the model generalisation while using them for training would exacerbate the issue of confirmation bias caused by the resulting inevitable mislabelling. To solve this problem, this paper proposes to use confusing samples proactively without label correction. Specifically, a Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation even without a concrete label. This provides an upper bound for inter-class information sharing capacity, which eventually leads to a better embedding space. Extensive experiments on two mainstream dense prediction tasks -- semantic segmentation and object detection, demonstrate that the proposed VC learning significantly surpasses the state-of-the-art, especially when only very few labels are available. Our intriguing findings highlight the usage of VC learning in dense vision tasks.
Paper Structure (30 sections, 14 equations, 12 figures, 18 tables, 1 algorithm)

This paper contains 30 sections, 14 equations, 12 figures, 18 tables, 1 algorithm.

Figures (12)

  • Figure 1: a): The mAP of a semi-supervised detector liu2021unbiased with a preset confidence score filtering on 1% labelled MS COCO lin2014microsoft. The mAP sees a decrease with all strategies for dealing with confusing samples (e.g., the bear-like dog at the right) except for the VC learning. The additional filtering mechanism (add.) is the temporal stability verification proposed in this paper. Stricter filtering (stricter.) raises the threshold in the score filtering from 0.7 to 0.8. b): The mIoU of a semi-supervised semantic segmentor on 1/128 labelled Pascal VOC.
  • Figure 2: Illustration of the basic idea of the virtual category in the manifold of the optimisation space. The peak indicates a high testing error value.
  • Figure 3: The pipeline of the proposed VC learning when dealing with a confusing sample in semi-supervised one-pixel classification. $T$ is the teacher model. $S$ represents the student model. When training the student classifier with a confusing sample, the weight matrix $W^{s}$ of the student classifier is extended by a virtual weight $w^v$, which is transformed from the corresponding teacher feature vector $\hat{f}$.
  • Figure 4: Explanation of data sample preparation for different tasks. In object detection, we use ROI Align he2017mask to extract the feature vectors of each region of interest as the data samples. While VC learning is operated in a pixel-wise manner in semantic segmentation.
  • Figure 5: Explanation of VC learning in the feature space. The circles are the embeddings of the category cluster centres. The hollow diamond is the embedding of the training sample. The cross-hatch diamond is the embedding of the virtual category.
  • ...and 7 more figures