Virtual Category Learning: A Semi-Supervised Learning Method for Dense Prediction with Extremely Limited Labels
Changrui Chen, Jungong Han, Kurt Debattista
TL;DR
The paper tackles the challenge of label scarcity in dense vision tasks by proposing Virtual Category (VC) learning, which assigns a virtual category to confusing pseudo-labeled samples to enable safe optimization without correct ground-truth labels. It introduces a Potential Category (PC) set per sample and uses a VC-based loss (e.g., $L_{VC-CE}$, $L_{VC-MSE}$) with a virtual weight generated via a transformer or teacher feature to backpropagate meaningful gradients, thereby mitigating confirmation bias. The approach yields theoretical and practical benefits, providing an upper bound on inter-class information sharing and improving embedding space, with strong empirical gains on semantic segmentation (Pascal VOC, Cityscapes) and object detection (MS COCO, VOC) under extremely limited labels; it also demonstrates applicability to non-dense tasks like miniImageNet. Overall, VC learning offers a principled, generalizable method to exploit confusing samples in semi-supervised dense prediction, outperforming state-of-the-art baselines across tasks and label regimes.
Abstract
Due to the costliness of labelled data in real-world applications, semi-supervised learning, underpinned by pseudo labelling, is an appealing solution. However, handling confusing samples is nontrivial: discarding valuable confusing samples would compromise the model generalisation while using them for training would exacerbate the issue of confirmation bias caused by the resulting inevitable mislabelling. To solve this problem, this paper proposes to use confusing samples proactively without label correction. Specifically, a Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation even without a concrete label. This provides an upper bound for inter-class information sharing capacity, which eventually leads to a better embedding space. Extensive experiments on two mainstream dense prediction tasks -- semantic segmentation and object detection, demonstrate that the proposed VC learning significantly surpasses the state-of-the-art, especially when only very few labels are available. Our intriguing findings highlight the usage of VC learning in dense vision tasks.
