Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification

Alexandre Audibert; Aurélien Gauffre; Massih-Reza Amini

Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification

Alexandre Audibert, Aurélien Gauffre, Massih-Reza Amini

TL;DR

The paper tackles long-tailed multi-label text classification (MLTC) by examining how supervised contrastive learning shapes representation quality. It introduces ABALONE, a Multi-label Supervised Contrastive Loss ($\mathcal{L}_{MSC}$) that uses a memory queue and trainable label prototypes to create robust positives and balance attraction/repulsion across labels. Ablations and experiments on RCV1-v2, AAPD, and UK-LEX show that $\mathcal{L}_{MSC}$ yields superior Macro-F1 while maintaining competitive Micro-F1, and that fine-tuning after contrastive pretraining further improves performance and transferability. This work advances MLTC by showing how tailored supervised contrastive learning can enhance both representation space and downstream performance for long-tailed NLP tasks.

Abstract

Learning an effective representation in multi-label text classification (MLTC) is a significant challenge in NLP. This challenge arises from the inherent complexity of the task, which is shaped by two key factors: the intricate connections between labels and the widespread long-tailed distribution of the data. To overcome this issue, one potential approach involves integrating supervised contrastive learning with classical supervised loss functions. Although contrastive learning has shown remarkable performance in multi-class classification, its impact in the multi-label framework has not been thoroughly investigated. In this paper, we conduct an in-depth study of supervised contrastive learning and its influence on representation in MLTC context. We emphasize the importance of considering long-tailed data distributions to build a robust representation space, which effectively addresses two critical challenges associated with contrastive learning that we identify: the "lack of positives" and the "attraction-repulsion imbalance". Building on this insight, we introduce a novel contrastive loss function for MLTC. It attains Micro-F1 scores that either match or surpass those obtained with other frequently employed loss functions, and demonstrates a significant improvement in Macro-F1 scores across three multi-label datasets.

Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification

TL;DR

) that uses a memory queue and trainable label prototypes to create robust positives and balance attraction/repulsion across labels. Ablations and experiments on RCV1-v2, AAPD, and UK-LEX show that

yields superior Macro-F1 while maintaining competitive Micro-F1, and that fine-tuning after contrastive pretraining further improves performance and transferability. This work advances MLTC by showing how tailored supervised contrastive learning can enhance both representation space and downstream performance for long-tailed NLP tasks.

Abstract

Paper Structure (29 sections, 17 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 17 equations, 2 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Supervised Contrastive Learning
Multi-label Classification
Supervised Contrastive Learning for Multi-label Classification
ABALONE
Contrastive Baseline $\mathcal{L}_{Base}$
Motivation
Lack of Positive Instances:
Attraction-Repulsion Imbalance:
Multi-label Supervised Contrastive Loss
Experimental Setup
Datasets
Comparison Baselines
Baseline: Learning a good representation space
...and 14 more sections

Figures (2)

Figure 1: Illustration of how "lack of positives" and "attraction-repulsion imbalance" problem are addressed by $\mathcal{L}_{Base}$ (classical contrastive loss for MLTC) and $\mathcal{L}_{MSC}$ (our proposed balanced Multi-label Supervised Contrastive loss). (a) Adding prototypes and a queue in $\mathcal{L}_{MSC}$ ensures a consistent positive pairing and expands positive and negative samples diversity. (b) Reweighting negative pairs addresses the imbalance between head and tail labels. For clarity, only the attraction/repulsion on the sample in the middle is depicted, without queue and prototypes. Color blue (respectively yellow) corresponds to a label in the head (respectively tail) of the distribution.
Figure 2: Clustering quality metrics of different approaches across top classes retained.

Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification

TL;DR

Abstract

Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (2)