Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning

Zijun Long; Lipeng Zhuang; George Killick; Richard McCreadie; Gerardo Aragon Camarasa; Paul Henderson

Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning

Zijun Long, Lipeng Zhuang, George Killick, Richard McCreadie, Gerardo Aragon Camarasa, Paul Henderson

TL;DR

The paper investigates how human-labelling errors uniquely affect supervised contrastive learning (SCL) and shows that most incorrect learning signals arise from false positives due to high visual similarity. It introduces SCL-RHE, a robust SCL objective that down-weights easy positives and uses positive-unlabeled inspired sampling to emphasize latent-class-consistent positives while mitigating negative mislabels, all without extra computational overhead. Empirical results across scratch and transfer-learning scenarios on benchmarks like CIFAR and ImageNet-1K demonstrate that SCL-RHE achieves state-of-the-art accuracy and robustness to realistic labeling noise, even when test labels are corrected. Overall, the work provides a practical and efficient solution for robust representation learning under real-world human-labelling noise with broad applicability to vision tasks.

Abstract

Human-annotated vision datasets inevitably contain a fraction of human mislabelled examples. While the detrimental effects of such mislabelling on supervised learning are well-researched, their influence on Supervised Contrastive Learning (SCL) remains largely unexplored. In this paper, we show that human-labelling errors not only differ significantly from synthetic label errors, but also pose unique challenges in SCL, different to those in traditional supervised learning methods. Specifically, our results indicate they adversely impact the learning process in the ~99% of cases when they occur as false positive samples. Existing noise-mitigating methods primarily focus on synthetic label errors and tackle the unrealistic setting of very high synthetic noise rates (40-80%), but they often underperform on common image datasets due to overfitting. To address this issue, we introduce a novel SCL objective with robustness to human-labelling errors, SCL-RHE. SCL-RHE is designed to mitigate the effects of real-world mislabelled examples, typically characterized by much lower noise rates (<5%). We demonstrate that SCL-RHE consistently outperforms state-of-the-art representation learning and noise-mitigating methods across various vision benchmarks, by offering improved resilience against human-labelling errors.

Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning

TL;DR

Abstract

Paper Structure (20 sections, 12 equations, 2 figures, 4 tables)

This paper contains 20 sections, 12 equations, 2 figures, 4 tables.

Introduction
Related Work
Setup for Contrastive Learning
Uniqueness of Human-labelling Errors and Their Impact on SCL
Definitions.
The Differences Between Human-Labelling Errors and Synthetic Label Errors
Impacts of Human-Labelling Errors on SCL
SCL with Robustness to Human-Labelling Errors
Human-Labelling Errors in the SCL Objective
Mitigating Human-Labelling Errors
Mitigating label errors for negatives.
Overall objective for SCL with robustness to human-labelling errors.
Experiments
Training from Scratch
Performance on corrected test sets.
...and 5 more sections

Figures (2)

Figure 1: Comparison between impacts of labelling errors on different learning approaches. AL represents 'Assigned Label' and LL represents 'Latent Label'. Those marked red in AL represent human-labelling errors. It is important to note that as long as a pair shares the same latent label, there are no adverse impacts on positive pairs. Similarly, if the latent labels differ, negative pairs remain unaffected.
Figure 2: Figures (a) and (c) display the log-scaled distribution of cosine similarities for various pair types, including true positive pairs, true negative pairs, and human-labelling errors, on the CIFAR-10 and ImageNet-1k datasets, respectively. Conversely, figures (b) and (d) present analogous data, focusing instead on synthetic label errors.

Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning

TL;DR

Abstract

Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)