Table of Contents
Fetching ...

Discriminative Representation learning via Attention-Enhanced Contrastive Learning for Short Text Clustering

Zhihao Yao

TL;DR

AECL tackles short text clustering by mitigating false negative separation in contrastive learning through a sample-level attention mechanism that builds cross-sample consistent representations. It integrates a Pseudo-label Generation Module and a Contrastive Learning Module that share encoder/projection/attention/clustering networks, employing similarity-guided contrastive learning and cluster-level objectives with pseudo-label supervision. The training follows a three-stage protocol to stabilize pseudo-labels and refine clustering, achieving state-of-the-art results on eight benchmark datasets and demonstrating improved intra-cluster cohesion and inter-cluster separation. The approach advances practical short-text clustering by embedding cross-sample semantics into both representation learning and clustering objectives, with open-source code anticipated.

Abstract

Contrastive learning has gained significant attention in short text clustering, yet it has an inherent drawback of mistakenly identifying samples from the same category as negatives and then separating them in the feature space (false negative separation), which hinders the generation of superior representations. To generate more discriminative representations for efficient clustering, we propose a novel short text clustering method, called Discriminative Representation learning via \textbf{A}ttention-\textbf{E}nhanced \textbf{C}ontrastive \textbf{L}earning for Short Text Clustering (\textbf{AECL}). The \textbf{AECL} consists of two modules which are the pseudo-label generation module and the contrastive learning module. Both modules build a sample-level attention mechanism to capture similarity relationships between samples and aggregate cross-sample features to generate consistent representations. Then, the former module uses the more discriminative consistent representation to produce reliable supervision information for assist clustering, while the latter module explores similarity relationships and consistent representations optimize the construction of positive samples to perform similarity-guided contrastive learning, effectively addressing the false negative separation issue. Experimental results demonstrate that the proposed \textbf{AECL} outperforms state-of-the-art methods. If the paper is accepted, we will open-source the code.

Discriminative Representation learning via Attention-Enhanced Contrastive Learning for Short Text Clustering

TL;DR

AECL tackles short text clustering by mitigating false negative separation in contrastive learning through a sample-level attention mechanism that builds cross-sample consistent representations. It integrates a Pseudo-label Generation Module and a Contrastive Learning Module that share encoder/projection/attention/clustering networks, employing similarity-guided contrastive learning and cluster-level objectives with pseudo-label supervision. The training follows a three-stage protocol to stabilize pseudo-labels and refine clustering, achieving state-of-the-art results on eight benchmark datasets and demonstrating improved intra-cluster cohesion and inter-cluster separation. The approach advances practical short-text clustering by embedding cross-sample semantics into both representation learning and clustering objectives, with open-source code anticipated.

Abstract

Contrastive learning has gained significant attention in short text clustering, yet it has an inherent drawback of mistakenly identifying samples from the same category as negatives and then separating them in the feature space (false negative separation), which hinders the generation of superior representations. To generate more discriminative representations for efficient clustering, we propose a novel short text clustering method, called Discriminative Representation learning via \textbf{A}ttention-\textbf{E}nhanced \textbf{C}ontrastive \textbf{L}earning for Short Text Clustering (\textbf{AECL}). The \textbf{AECL} consists of two modules which are the pseudo-label generation module and the contrastive learning module. Both modules build a sample-level attention mechanism to capture similarity relationships between samples and aggregate cross-sample features to generate consistent representations. Then, the former module uses the more discriminative consistent representation to produce reliable supervision information for assist clustering, while the latter module explores similarity relationships and consistent representations optimize the construction of positive samples to perform similarity-guided contrastive learning, effectively addressing the false negative separation issue. Experimental results demonstrate that the proposed \textbf{AECL} outperforms state-of-the-art methods. If the paper is accepted, we will open-source the code.
Paper Structure (24 sections, 19 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 19 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Conventional contrastive learning only considers augmented views from one sample as positive pairs, which leads to false negative separation (as shown in the green samples). Our method optimizes the construction of positive samples by leveraging semantic similarity, effectively addressing the false negative separation problem.
  • Figure 2: Overall structure of AECL. Our model contains two modules: (a) Pseudo-label Generation Module, and (b) Contrastive Learning Module.
  • Figure 3: The structure of Attention aggregation network. $\boldsymbol{S}^{(0)}$ denotes the similarity matrix among samples.
  • Figure 4: T-SNE visualization of the representations on Stackoverflow, each color indicates ground truth category.
  • Figure 5: Performance comparison in dealing with false negative separation. The colored areas denote the variances.
  • ...and 4 more figures