Table of Contents
Fetching ...

Human-Corrected Labels Learning: Enhancing Labels Quality via Human Correction of VLMs Discrepancies

Zhongnian Li, Lan Chen, Yixin Xu, Shi Xu, Xinzheng Xu

TL;DR

This work tackles noisy labels produced by vision-language models by introducing Human-Corrected Labels (HCLs), a framework that uses multiple VLMs to generate candidate labels and employs human correction only when predictions disagree. It develops a risk-consistent estimator and a conditional label distribution to train classifiers under weak supervision, seamlessly blending VLM priors with human corrections. Empirical results across six diverse datasets show that HCL outperforms existing weakly supervised methods and approaches fully supervised performance at a fraction of the labeling cost, with robustness to VLM choice and prompts. The approach offers a practical, scalable solution for reliable VLM-based data annotation in real-world settings.

Abstract

Vision-Language Models (VLMs), with their powerful content generation capabilities, have been successfully applied to data annotation processes. However, the VLM-generated labels exhibit dual limitations: low quality (i.e., label noise) and absence of error correction mechanisms. To enhance label quality, we propose Human-Corrected Labels (HCLs), a novel setting that efficient human correction for VLM-generated noisy labels. As shown in Figure 1(b), HCL strategically deploys human correction only for instances with VLM discrepancies, achieving both higher-quality annotations and reduced labor costs. Specifically, we theoretically derive a risk-consistent estimator that incorporates both human-corrected labels and VLM predictions to train classifiers. Besides, we further propose a conditional probability method to estimate the label distribution using a combination of VLM outputs and model predictions. Extensive experiments demonstrate that our approach achieves superior classification performance and is robust to label noise, validating the effectiveness of HCL in practical weak supervision scenarios. Code https://github.com/Lilianach24/HCL.git

Human-Corrected Labels Learning: Enhancing Labels Quality via Human Correction of VLMs Discrepancies

TL;DR

This work tackles noisy labels produced by vision-language models by introducing Human-Corrected Labels (HCLs), a framework that uses multiple VLMs to generate candidate labels and employs human correction only when predictions disagree. It develops a risk-consistent estimator and a conditional label distribution to train classifiers under weak supervision, seamlessly blending VLM priors with human corrections. Empirical results across six diverse datasets show that HCL outperforms existing weakly supervised methods and approaches fully supervised performance at a fraction of the labeling cost, with robustness to VLM choice and prompts. The approach offers a practical, scalable solution for reliable VLM-based data annotation in real-world settings.

Abstract

Vision-Language Models (VLMs), with their powerful content generation capabilities, have been successfully applied to data annotation processes. However, the VLM-generated labels exhibit dual limitations: low quality (i.e., label noise) and absence of error correction mechanisms. To enhance label quality, we propose Human-Corrected Labels (HCLs), a novel setting that efficient human correction for VLM-generated noisy labels. As shown in Figure 1(b), HCL strategically deploys human correction only for instances with VLM discrepancies, achieving both higher-quality annotations and reduced labor costs. Specifically, we theoretically derive a risk-consistent estimator that incorporates both human-corrected labels and VLM predictions to train classifiers. Besides, we further propose a conditional probability method to estimate the label distribution using a combination of VLM outputs and model predictions. Extensive experiments demonstrate that our approach achieves superior classification performance and is robust to label noise, validating the effectiveness of HCL in practical weak supervision scenarios. Code https://github.com/Lilianach24/HCL.git

Paper Structure

This paper contains 34 sections, 15 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: A comparison between traditional VLM annotation and Human-Corrected Labels (HCLs). The zero-shot results depicted are obtained using CLIP with ViT-L/14. The example images and categories are taken from the Caltech-101 dataset. HCL deploys human correction only for instances with VLMs discrepancies, achieving both higher-quality annotations and reduced labor costs.
  • Figure 2: Classification accuracy over training epochs (1–30) for each method on (a) CIFAR100, (b) Caltech-101, (c) Food-101, and (d) DTD.
  • Figure 3: Experimental results on the influence of the conditional probability hyperparameter $\lambda$. Experiments are performed on human corrected HCLs data.