Enhancing Topic Interpretability for Neural Topic Modeling through Topic-wise Contrastive Learning
Xin Gao, Yang Lin, Ruiqing Li, Yasha Wang, Xu Chu, Xinyu Ma, Hailong Yu
TL;DR
ContraTopic tackles the misalignment between likelihood-focused neural topic models and the goal of interpretable knowledge discovery by introducing a differentiable topic-wise contrastive regularizer that promotes intra-topic coherence and inter-topic distinctiveness. The approach integrates a Gumbel-Softmax-based top-$v$ word sampling and a precomputed $\mathrm{NPMI}$-based similarity measure into the training objective, yielding final loss $L_{tr} = L_{rec} + L_{kl} + \lambda L_{con}$ where $K=100$ topics are held fixed. Empirical results on 20NG, Yahoo Answers, and NYTimes show substantial gains in topic coherence and diversity, with consistent improvements across backbone models and strong alignment with human evaluations. These findings demonstrate a scalable, differentiable mechanism to enhance topic interpretability in NTMs without external supervision, with potential extensions to online and multi-level contrastive learning frameworks.
Abstract
Data mining and knowledge discovery are essential aspects of extracting valuable insights from vast datasets. Neural topic models (NTMs) have emerged as a valuable unsupervised tool in this field. However, the predominant objective in NTMs, which aims to discover topics maximizing data likelihood, often lacks alignment with the central goals of data mining and knowledge discovery which is to reveal interpretable insights from large data repositories. Overemphasizing likelihood maximization without incorporating topic regularization can lead to an overly expansive latent space for topic modeling. In this paper, we present an innovative approach to NTMs that addresses this misalignment by introducing contrastive learning measures to assess topic interpretability. We propose a novel NTM framework, named ContraTopic, that integrates a differentiable regularizer capable of evaluating multiple facets of topic interpretability throughout the training process. Our regularizer adopts a unique topic-wise contrastive methodology, fostering both internal coherence within topics and clear external distinctions among them. Comprehensive experiments conducted on three diverse datasets demonstrate that our approach consistently produces topics with superior interpretability compared to state-of-the-art NTMs.
