Table of Contents
Fetching ...

Contrastive Learning Subspace for Text Clustering

Qian Yong, Chen Chen, Xiabing Zhou

TL;DR

This paper addresses text clustering by moving beyond instance-wise contrastive learning to model cluster-wise relationships. It introduces Subspace Contrastive Learning (SCL), which uses a self-expressive module to generate virtual positives and a cluster-wise contrastive loss to learn a discriminative subspace that reflects cluster structure without needing the number of categories. The approach achieves state-of-the-art or competitive results on seven short-text clustering datasets, demonstrating robustness across encoder types and improvements in cluster separation. The method reduces positive-sample construction costs and offers a pathway to more scalable, category-agnostic clustering in NLP applications.

Abstract

Contrastive learning has been frequently investigated to learn effective representations for text clustering tasks. While existing contrastive learning-based text clustering methods only focus on modeling instance-wise semantic similarity relationships, they ignore contextual information and underlying relationships among all instances that needs to be clustered. In this paper, we propose a novel text clustering approach called Subspace Contrastive Learning (SCL) which models cluster-wise relationships among instances. Specifically, the proposed SCL consists of two main modules: (1) a self-expressive module that constructs virtual positive samples and (2) a contrastive learning module that further learns a discriminative subspace to capture task-specific cluster-wise relationships among texts. Experimental results show that the proposed SCL method not only has achieved superior results on multiple task clustering datasets but also has less complexity in positive sample construction.

Contrastive Learning Subspace for Text Clustering

TL;DR

This paper addresses text clustering by moving beyond instance-wise contrastive learning to model cluster-wise relationships. It introduces Subspace Contrastive Learning (SCL), which uses a self-expressive module to generate virtual positives and a cluster-wise contrastive loss to learn a discriminative subspace that reflects cluster structure without needing the number of categories. The approach achieves state-of-the-art or competitive results on seven short-text clustering datasets, demonstrating robustness across encoder types and improvements in cluster separation. The method reduces positive-sample construction costs and offers a pathway to more scalable, category-agnostic clustering in NLP applications.

Abstract

Contrastive learning has been frequently investigated to learn effective representations for text clustering tasks. While existing contrastive learning-based text clustering methods only focus on modeling instance-wise semantic similarity relationships, they ignore contextual information and underlying relationships among all instances that needs to be clustered. In this paper, we propose a novel text clustering approach called Subspace Contrastive Learning (SCL) which models cluster-wise relationships among instances. Specifically, the proposed SCL consists of two main modules: (1) a self-expressive module that constructs virtual positive samples and (2) a contrastive learning module that further learns a discriminative subspace to capture task-specific cluster-wise relationships among texts. Experimental results show that the proposed SCL method not only has achieved superior results on multiple task clustering datasets but also has less complexity in positive sample construction.
Paper Structure (15 sections, 5 equations, 5 figures, 3 tables)

This paper contains 15 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: TSNE visualization of the embedding space learned on StackOverflows using Sentence Transformer as backbone. Each color indicates a ground truth semantic category. The boundaries between clusters based on cluster-wise contrastive learning are more clear than instance-wise methods.
  • Figure 2: This is architecture of the method we proposed and take the green sample as an example. The inputs are encoded as features firstly and then generate virtual samples to form positive pairs (red sample) and negative pairs (purple and yellow samples). Contrastive learning will be applied to pull together positive pairs and push apart negative pairs.
  • Figure 3: The influence of the parameter $\lambda_{reg}$ range from 1e-4 to 1. A slight $\lambda_{reg}$ can gain better performance compared with no regularization or two much regularization.
  • Figure 4: The heatmap of subspace affinity matrix on StackOverflows. And brighter points mean higher similarity between samples.
  • Figure 5: Grid search of the temperature parameter and the adapted temperature on StackOverflow.