Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning
Zijun Long, Lipeng Zhuang, George Killick, Richard McCreadie, Gerardo Aragon Camarasa, Paul Henderson
TL;DR
The paper investigates how human-labelling errors uniquely affect supervised contrastive learning (SCL) and shows that most incorrect learning signals arise from false positives due to high visual similarity. It introduces SCL-RHE, a robust SCL objective that down-weights easy positives and uses positive-unlabeled inspired sampling to emphasize latent-class-consistent positives while mitigating negative mislabels, all without extra computational overhead. Empirical results across scratch and transfer-learning scenarios on benchmarks like CIFAR and ImageNet-1K demonstrate that SCL-RHE achieves state-of-the-art accuracy and robustness to realistic labeling noise, even when test labels are corrected. Overall, the work provides a practical and efficient solution for robust representation learning under real-world human-labelling noise with broad applicability to vision tasks.
Abstract
Human-annotated vision datasets inevitably contain a fraction of human mislabelled examples. While the detrimental effects of such mislabelling on supervised learning are well-researched, their influence on Supervised Contrastive Learning (SCL) remains largely unexplored. In this paper, we show that human-labelling errors not only differ significantly from synthetic label errors, but also pose unique challenges in SCL, different to those in traditional supervised learning methods. Specifically, our results indicate they adversely impact the learning process in the ~99% of cases when they occur as false positive samples. Existing noise-mitigating methods primarily focus on synthetic label errors and tackle the unrealistic setting of very high synthetic noise rates (40-80%), but they often underperform on common image datasets due to overfitting. To address this issue, we introduce a novel SCL objective with robustness to human-labelling errors, SCL-RHE. SCL-RHE is designed to mitigate the effects of real-world mislabelled examples, typically characterized by much lower noise rates (<5%). We demonstrate that SCL-RHE consistently outperforms state-of-the-art representation learning and noise-mitigating methods across various vision benchmarks, by offering improved resilience against human-labelling errors.
