Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification
Alexandre Audibert, Aurélien Gauffre, Massih-Reza Amini
TL;DR
The paper tackles long-tailed multi-label text classification (MLTC) by examining how supervised contrastive learning shapes representation quality. It introduces ABALONE, a Multi-label Supervised Contrastive Loss ($\mathcal{L}_{MSC}$) that uses a memory queue and trainable label prototypes to create robust positives and balance attraction/repulsion across labels. Ablations and experiments on RCV1-v2, AAPD, and UK-LEX show that $\mathcal{L}_{MSC}$ yields superior Macro-F1 while maintaining competitive Micro-F1, and that fine-tuning after contrastive pretraining further improves performance and transferability. This work advances MLTC by showing how tailored supervised contrastive learning can enhance both representation space and downstream performance for long-tailed NLP tasks.
Abstract
Learning an effective representation in multi-label text classification (MLTC) is a significant challenge in NLP. This challenge arises from the inherent complexity of the task, which is shaped by two key factors: the intricate connections between labels and the widespread long-tailed distribution of the data. To overcome this issue, one potential approach involves integrating supervised contrastive learning with classical supervised loss functions. Although contrastive learning has shown remarkable performance in multi-class classification, its impact in the multi-label framework has not been thoroughly investigated. In this paper, we conduct an in-depth study of supervised contrastive learning and its influence on representation in MLTC context. We emphasize the importance of considering long-tailed data distributions to build a robust representation space, which effectively addresses two critical challenges associated with contrastive learning that we identify: the "lack of positives" and the "attraction-repulsion imbalance". Building on this insight, we introduce a novel contrastive loss function for MLTC. It attains Micro-F1 scores that either match or surpass those obtained with other frequently employed loss functions, and demonstrates a significant improvement in Macro-F1 scores across three multi-label datasets.
