Context-Enriched Contrastive Loss: Enhancing Presentation of Inherent Sample Connections in Contrastive Learning Framework
Haojin Deng, Yimin Yang
TL;DR
ConTeX addresses two core issues in contrastive learning: distortion from augmentations and slow convergence, by introducing a context-enriched loss with two complementary components that separately leverage context-based contrasts and self-positives. The method yields faster convergence and improved generalization, with strong fairness gains on bias-related tasks such as BiasedMNIST, UTKFace, and CelebA. Extensive experiments across CIFAR-10/100, ImageNet, and transfer settings show competitive or superior performance versus state-of-the-art contrastive losses, including SupCon, with substantial bias mitigation. The work demonstrates ConTeX's potential for efficient, fair downstream training and provides theoretical insights via an upper-bound analysis and gradient-level justification.
Abstract
Contrastive learning has gained popularity and pushes state-of-the-art performance across numerous large-scale benchmarks. In contrastive learning, the contrastive loss function plays a pivotal role in discerning similarities between samples through techniques such as rotation or cropping. However, this learning mechanism can also introduce information distortion from the augmented samples. This is because the trained model may develop a significant overreliance on information from samples with identical labels, while concurrently neglecting positive pairs that originate from the same initial image, especially in expansive datasets. This paper proposes a context-enriched contrastive loss function that concurrently improves learning effectiveness and addresses the information distortion by encompassing two convergence targets. The first component, which is notably sensitive to label contrast, differentiates between features of identical and distinct classes which boosts the contrastive training efficiency. Meanwhile, the second component draws closer the augmented samples from the same source image and distances all other samples. We evaluate the proposed approach on image classification tasks, which are among the most widely accepted 8 recognition large-scale benchmark datasets: CIFAR10, CIFAR100, Caltech-101, Caltech-256, ImageNet, BiasedMNIST, UTKFace, and CelebA datasets. The experimental results demonstrate that the proposed method achieves improvements over 16 state-of-the-art contrastive learning methods in terms of both generalization performance and learning convergence speed. Interestingly, our technique stands out in addressing systematic distortion tasks. It demonstrates a 22.9% improvement compared to original contrastive loss functions in the downstream BiasedMNIST dataset, highlighting its promise for more efficient and equitable downstream training.
