Table of Contents
Fetching ...

Class-Aware Contrastive Optimization for Imbalanced Text Classification

Grigorii Khvatskii, Nuno Moniz, Khoa Doan, Nitesh V Chawla

TL;DR

This paper shows that leveraging class-aware contrastive optimization combined with denoising autoencoders can successfully tackle imbalanced text classification tasks, achieving better performance than the other strong text classification models.

Abstract

The unique characteristics of text data make classification tasks a complex problem. Advances in unsupervised and semi-supervised learning and autoencoder architectures addressed several challenges. However, they still struggle with imbalanced text classification tasks, a common scenario in real-world applications, demonstrating a tendency to produce embeddings with unfavorable properties, such as class overlap. In this paper, we show that leveraging class-aware contrastive optimization combined with denoising autoencoders can successfully tackle imbalanced text classification tasks, achieving better performance than the current state-of-the-art. Concretely, our proposal combines reconstruction loss with contrastive class separation in the embedding space, allowing a better balance between the truthfulness of the generated embeddings and the model's ability to separate different classes. Compared with an extensive set of traditional and state-of-the-art competing methods, our proposal demonstrates a notable increase in performance across a wide variety of text datasets.

Class-Aware Contrastive Optimization for Imbalanced Text Classification

TL;DR

This paper shows that leveraging class-aware contrastive optimization combined with denoising autoencoders can successfully tackle imbalanced text classification tasks, achieving better performance than the other strong text classification models.

Abstract

The unique characteristics of text data make classification tasks a complex problem. Advances in unsupervised and semi-supervised learning and autoencoder architectures addressed several challenges. However, they still struggle with imbalanced text classification tasks, a common scenario in real-world applications, demonstrating a tendency to produce embeddings with unfavorable properties, such as class overlap. In this paper, we show that leveraging class-aware contrastive optimization combined with denoising autoencoders can successfully tackle imbalanced text classification tasks, achieving better performance than the current state-of-the-art. Concretely, our proposal combines reconstruction loss with contrastive class separation in the embedding space, allowing a better balance between the truthfulness of the generated embeddings and the model's ability to separate different classes. Compared with an extensive set of traditional and state-of-the-art competing methods, our proposal demonstrates a notable increase in performance across a wide variety of text datasets.

Paper Structure

This paper contains 15 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: CAROL Training Pipeline
  • Figure 2: Relationship between classifier F1 (1), different loss components (class separation and reconstruction) (2), and the value of $C$, averaged across datasets, for three different distance measures: Euclidean (a), Cosine (b), Chebyshev (c).
  • Figure 3: Comparison of TSDAE-CAROL embeddings with $C=0$, $C=0.5$ and $C=1.0$ (India Police dataset)

Theorems & Definitions (3)

  • Definition 1: Class Separation
  • Definition 2: Interclass Distance
  • Definition 3: Intraclass Distance