Bridging Weakly-Supervised Learning and VLM Distillation: Noisy Partial Label Learning for Efficient Downstream Adaptation
Qian-Wei Wang, Yuqiu Xie, Letian Zhang, Zimo Liu, Shu-Tao Xia
TL;DR
The paper addresses instance-dependent noise in labels produced by pre-trained vision-language models when learning downstream tasks under noisy partial labeling. It proposes Co-Reg, a collaborative consistency regularization framework with two networks performing co-pseudo-labeling, self-training, prototypical similarity alignment, and noisy contrastive learning to robustly recover ground-truth distributions from VLM annotations. Across six datasets and multiple VLM backbones, Co-Reg consistently outperforms state-of-the-art NPLL and KD baselines, including in semi-supervised settings with a few manually labeled examples, demonstrating annotation-free yet effective downstream adaptation. By uniting weakly-supervised learning with distillation-style knowledge transfer, the approach offers practical, scalable improvements for leveraging large vision-language models in real-world tasks without extensive manual labeling.
Abstract
In the context of noisy partial label learning (NPLL), each training sample is associated with a set of candidate labels annotated by multiple noisy annotators. With the emergence of high-performance pre-trained vision-language models (VLMs) such as CLIP, LLaVA and GPT-4V, the direction of using these models to replace time-consuming manual annotation workflows and achieve ``manual-annotation-free" training for downstream tasks has become a highly promising research avenue. This paper focuses on learning from noisy partial labels annotated by pre-trained VLMs and proposes an innovative collaborative consistency regularization (Co-Reg) method. Unlike the symmetric noise primarily addressed in traditional noisy label learning, the noise generated by pre-trained models is instance-dependent, embodying the underlying patterns of the pre-trained models themselves, which significantly increases the learning difficulty for the model. To address this, we simultaneously train two neural networks that implement collaborative purification of training labels through a ``Co-Pseudo-Labeling" mechanism, while enforcing consistency regularization constraints in both the label space and feature representation space. Specifically, we construct multiple anti-overfitting mechanisms that efficiently mine latent information from noisy partially labeled samples including alternating optimization of contrastive feature representations and pseudo-labels, as well as maintaining prototypical class vectors in the shared feature space.
