Contrastive Learning Subspace for Text Clustering
Qian Yong, Chen Chen, Xiabing Zhou
TL;DR
This paper addresses text clustering by moving beyond instance-wise contrastive learning to model cluster-wise relationships. It introduces Subspace Contrastive Learning (SCL), which uses a self-expressive module to generate virtual positives and a cluster-wise contrastive loss to learn a discriminative subspace that reflects cluster structure without needing the number of categories. The approach achieves state-of-the-art or competitive results on seven short-text clustering datasets, demonstrating robustness across encoder types and improvements in cluster separation. The method reduces positive-sample construction costs and offers a pathway to more scalable, category-agnostic clustering in NLP applications.
Abstract
Contrastive learning has been frequently investigated to learn effective representations for text clustering tasks. While existing contrastive learning-based text clustering methods only focus on modeling instance-wise semantic similarity relationships, they ignore contextual information and underlying relationships among all instances that needs to be clustered. In this paper, we propose a novel text clustering approach called Subspace Contrastive Learning (SCL) which models cluster-wise relationships among instances. Specifically, the proposed SCL consists of two main modules: (1) a self-expressive module that constructs virtual positive samples and (2) a contrastive learning module that further learns a discriminative subspace to capture task-specific cluster-wise relationships among texts. Experimental results show that the proposed SCL method not only has achieved superior results on multiple task clustering datasets but also has less complexity in positive sample construction.
