Table of Contents
Fetching ...

Enhancing Topic Interpretability for Neural Topic Modeling through Topic-wise Contrastive Learning

Xin Gao, Yang Lin, Ruiqing Li, Yasha Wang, Xu Chu, Xinyu Ma, Hailong Yu

TL;DR

ContraTopic tackles the misalignment between likelihood-focused neural topic models and the goal of interpretable knowledge discovery by introducing a differentiable topic-wise contrastive regularizer that promotes intra-topic coherence and inter-topic distinctiveness. The approach integrates a Gumbel-Softmax-based top-$v$ word sampling and a precomputed $\mathrm{NPMI}$-based similarity measure into the training objective, yielding final loss $L_{tr} = L_{rec} + L_{kl} + \lambda L_{con}$ where $K=100$ topics are held fixed. Empirical results on 20NG, Yahoo Answers, and NYTimes show substantial gains in topic coherence and diversity, with consistent improvements across backbone models and strong alignment with human evaluations. These findings demonstrate a scalable, differentiable mechanism to enhance topic interpretability in NTMs without external supervision, with potential extensions to online and multi-level contrastive learning frameworks.

Abstract

Data mining and knowledge discovery are essential aspects of extracting valuable insights from vast datasets. Neural topic models (NTMs) have emerged as a valuable unsupervised tool in this field. However, the predominant objective in NTMs, which aims to discover topics maximizing data likelihood, often lacks alignment with the central goals of data mining and knowledge discovery which is to reveal interpretable insights from large data repositories. Overemphasizing likelihood maximization without incorporating topic regularization can lead to an overly expansive latent space for topic modeling. In this paper, we present an innovative approach to NTMs that addresses this misalignment by introducing contrastive learning measures to assess topic interpretability. We propose a novel NTM framework, named ContraTopic, that integrates a differentiable regularizer capable of evaluating multiple facets of topic interpretability throughout the training process. Our regularizer adopts a unique topic-wise contrastive methodology, fostering both internal coherence within topics and clear external distinctions among them. Comprehensive experiments conducted on three diverse datasets demonstrate that our approach consistently produces topics with superior interpretability compared to state-of-the-art NTMs.

Enhancing Topic Interpretability for Neural Topic Modeling through Topic-wise Contrastive Learning

TL;DR

ContraTopic tackles the misalignment between likelihood-focused neural topic models and the goal of interpretable knowledge discovery by introducing a differentiable topic-wise contrastive regularizer that promotes intra-topic coherence and inter-topic distinctiveness. The approach integrates a Gumbel-Softmax-based top- word sampling and a precomputed -based similarity measure into the training objective, yielding final loss where topics are held fixed. Empirical results on 20NG, Yahoo Answers, and NYTimes show substantial gains in topic coherence and diversity, with consistent improvements across backbone models and strong alignment with human evaluations. These findings demonstrate a scalable, differentiable mechanism to enhance topic interpretability in NTMs without external supervision, with potential extensions to online and multi-level contrastive learning frameworks.

Abstract

Data mining and knowledge discovery are essential aspects of extracting valuable insights from vast datasets. Neural topic models (NTMs) have emerged as a valuable unsupervised tool in this field. However, the predominant objective in NTMs, which aims to discover topics maximizing data likelihood, often lacks alignment with the central goals of data mining and knowledge discovery which is to reveal interpretable insights from large data repositories. Overemphasizing likelihood maximization without incorporating topic regularization can lead to an overly expansive latent space for topic modeling. In this paper, we present an innovative approach to NTMs that addresses this misalignment by introducing contrastive learning measures to assess topic interpretability. We propose a novel NTM framework, named ContraTopic, that integrates a differentiable regularizer capable of evaluating multiple facets of topic interpretability throughout the training process. Our regularizer adopts a unique topic-wise contrastive methodology, fostering both internal coherence within topics and clear external distinctions among them. Comprehensive experiments conducted on three diverse datasets demonstrate that our approach consistently produces topics with superior interpretability compared to state-of-the-art NTMs.

Paper Structure

This paper contains 33 sections, 7 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: The fundamental insight of ContraTopic. During the training of NTMs, words are sampled from each topic for the evaluation of topic coherence and diversity. Words in the same color are sampled from the same topic. By encouraging similarity between positive word pairs and discouraging similarity between negative word pairs, the coherence and diversity of generated topics can be improved.
  • Figure 2: The results of topic interpretability evaluation. The first row shows the topic coherence of all the datasets in the test set, respectively. The second row shows the corresponding topic diversity scores. In each subfigure, the horizontal axis indicates the proportion of selected topics according to their NPMIs.
  • Figure 3: The results of document representation evaluation. (a) The two subfigures show the km-Purity scores on 20NG (left) and Yahoo(right). (b) The two subfigures show the km-NMI scores on 20NG (left) and Yahoo (right).
  • Figure 4: The sensitivity analysis results of $\lambda$ and $v$ on 20NG and Yahoo.
  • Figure 5: The sensitivity analysis results of $\lambda$ and $v$ on NYTimes.
  • ...and 2 more figures