Table of Contents
Fetching ...

Multi-Label Contrastive Learning : A Comprehensive Study

Alexandre Audibert, Aurélien Gauffre, Massih-Reza Amini

TL;DR

This work investigates how contrastive learning can be effectively applied to multi-label classification by analyzing gradient dynamics, proposing a gradient-regularized multi-label contrastive loss (REG) that leverages label prototypes, and reweighting positives based on label overlap. Through extensive experiments in CV and NLP, REG consistently outperforms prior contrastive losses on datasets with many labels and shows robust performance in low-data regimes, while revealing that traditional BCE-based losses may prevail when the label set is small. The study demonstrates that the gains from contrastive methods arise not only from modeling label interactions but also from optimization properties, and it introduces a scalable, prototype-inclusive framework that achieves state-of-the-art results in several benchmarks. These findings have practical implications for designing efficient, scalable loss functions for complex, real-world multi-label problems.

Abstract

Multi-label classification, which involves assigning multiple labels to a single input, has emerged as a key area in both research and industry due to its wide-ranging applications. Designing effective loss functions is crucial for optimizing deep neural networks for this task, as they significantly influence model performance and efficiency. Traditional loss functions, which often maximize likelihood under the assumption of label independence, may struggle to capture complex label relationships. Recent research has turned to supervised contrastive learning, a method that aims to create a structured representation space by bringing similar instances closer together and pushing dissimilar ones apart. Although contrastive learning offers a promising approach, applying it to multi-label classification presents unique challenges, particularly in managing label interactions and data structure. In this paper, we conduct an in-depth study of contrastive learning loss for multi-label classification across diverse settings. These include datasets with both small and large numbers of labels, datasets with varying amounts of training data, and applications in both computer vision and natural language processing. Our empirical results indicate that the promising outcomes of contrastive learning are attributable not only to the consideration of label interactions but also to the robust optimization scheme of the contrastive loss. Furthermore, while the supervised contrastive loss function faces challenges with datasets containing a small number of labels and ranking-based metrics, it demonstrates excellent performance, particularly in terms of Macro-F1, on datasets with a large number of labels.

Multi-Label Contrastive Learning : A Comprehensive Study

TL;DR

This work investigates how contrastive learning can be effectively applied to multi-label classification by analyzing gradient dynamics, proposing a gradient-regularized multi-label contrastive loss (REG) that leverages label prototypes, and reweighting positives based on label overlap. Through extensive experiments in CV and NLP, REG consistently outperforms prior contrastive losses on datasets with many labels and shows robust performance in low-data regimes, while revealing that traditional BCE-based losses may prevail when the label set is small. The study demonstrates that the gains from contrastive methods arise not only from modeling label interactions but also from optimization properties, and it introduces a scalable, prototype-inclusive framework that achieves state-of-the-art results in several benchmarks. These findings have practical implications for designing efficient, scalable loss functions for complex, real-world multi-label problems.

Abstract

Multi-label classification, which involves assigning multiple labels to a single input, has emerged as a key area in both research and industry due to its wide-ranging applications. Designing effective loss functions is crucial for optimizing deep neural networks for this task, as they significantly influence model performance and efficiency. Traditional loss functions, which often maximize likelihood under the assumption of label independence, may struggle to capture complex label relationships. Recent research has turned to supervised contrastive learning, a method that aims to create a structured representation space by bringing similar instances closer together and pushing dissimilar ones apart. Although contrastive learning offers a promising approach, applying it to multi-label classification presents unique challenges, particularly in managing label interactions and data structure. In this paper, we conduct an in-depth study of contrastive learning loss for multi-label classification across diverse settings. These include datasets with both small and large numbers of labels, datasets with varying amounts of training data, and applications in both computer vision and natural language processing. Our empirical results indicate that the promising outcomes of contrastive learning are attributable not only to the consideration of label interactions but also to the robust optimization scheme of the contrastive loss. Furthermore, while the supervised contrastive loss function faces challenges with datasets containing a small number of labels and ranking-based metrics, it demonstrates excellent performance, particularly in terms of Macro-F1, on datasets with a large number of labels.

Paper Structure

This paper contains 48 sections, 28 equations, 3 figures, 11 tables, 2 algorithms.

Figures (3)

  • Figure 1: Visualization of gradient behavior in the multi-label contrastive loss: too close embeddings associated with Yellow label generate gradients causing unintended repulsion. Our proposed regularization adjusts these gradients, canceling the repulsion and ensuring proper attraction between similar examples.
  • Figure 2: Macro-F1 under limited data conditions as a function of training set size: Contrastive losses demonstrate higher resilience to data scarcity.
  • Figure 3: Performance across training epochs with random initialization: Contrastive losses display resilience to overfitting and benefit from extended training durations, whereas ZLPR shows signs of overfitting, particularly on the NUS-WIDE dataset.