Table of Contents
Fetching ...

Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion

Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu

TL;DR

Dynamic topic models often suffer from repetitive topics within a time slice and unassociated topics across time. We introduce CFDTM, a chain-free neural dynamic topic model that combines Evolution-Tracking Contrastive Learning (ETC) and Unassociated Word Exclusion (UWE) to simultaneously track topic evolution and prune irrelevant words. Topics are represented as embeddings $\boldsymbol{\varphi}^{(t)}_k$ with a distance-based beta distribution $\beta^{(t)}_{k,i}$, and sequential documents are generated via a VAE-like ELBO $\mathcal{L}_{\text{TM}}$, augmented by $\mathcal{L}_{\text{ETC}}$ and $\lambda_{\text{UWE}} \mathcal{L}_{\text{UWE}}$ to form the overall objective. Empirical results on benchmark datasets show higher topic coherence $C_V$ and diversity across time, stronger downstream classification/clustering performance, and robustness to the evolution intensity hyperparameter $\lambda^{(t)}$. The approach yields clearer topic evolution and improved interpretability, with code available at the project repository.

Abstract

Dynamic topic models track the evolution of topics in sequential documents, which have derived various applications like trend analysis and opinion mining. However, existing models suffer from repetitive topic and unassociated topic issues, failing to reveal the evolution and hindering further applications. To address these issues, we break the tradition of simply chaining topics in existing work and propose a novel neural \modelfullname. We introduce a new evolution-tracking contrastive learning method that builds the similarity relations among dynamic topics. This not only tracks topic evolution but also maintains topic diversity, mitigating the repetitive topic issue. To avoid unassociated topics, we further present an unassociated word exclusion method that consistently excludes unassociated words from discovered topics. Extensive experiments demonstrate our model significantly outperforms state-of-the-art baselines, tracking topic evolution with high-quality topics, showing better performance on downstream tasks, and remaining robust to the hyperparameter for evolution intensities. Our code is available at https://github.com/bobxwu/CFDTM .

Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion

TL;DR

Dynamic topic models often suffer from repetitive topics within a time slice and unassociated topics across time. We introduce CFDTM, a chain-free neural dynamic topic model that combines Evolution-Tracking Contrastive Learning (ETC) and Unassociated Word Exclusion (UWE) to simultaneously track topic evolution and prune irrelevant words. Topics are represented as embeddings with a distance-based beta distribution , and sequential documents are generated via a VAE-like ELBO , augmented by and to form the overall objective. Empirical results on benchmark datasets show higher topic coherence and diversity across time, stronger downstream classification/clustering performance, and robustness to the evolution intensity hyperparameter . The approach yields clearer topic evolution and improved interpretability, with code available at the project repository.

Abstract

Dynamic topic models track the evolution of topics in sequential documents, which have derived various applications like trend analysis and opinion mining. However, existing models suffer from repetitive topic and unassociated topic issues, failing to reveal the evolution and hindering further applications. To address these issues, we break the tradition of simply chaining topics in existing work and propose a novel neural \modelfullname. We introduce a new evolution-tracking contrastive learning method that builds the similarity relations among dynamic topics. This not only tracks topic evolution but also maintains topic diversity, mitigating the repetitive topic issue. To avoid unassociated topics, we further present an unassociated word exclusion method that consistently excludes unassociated words from discovered topics. Extensive experiments demonstrate our model significantly outperforms state-of-the-art baselines, tracking topic evolution with high-quality topics, showing better performance on downstream tasks, and remaining robust to the hyperparameter for evolution intensities. Our code is available at https://github.com/bobxwu/CFDTM .
Paper Structure (38 sections, 10 equations, 8 figures, 5 tables)

This paper contains 38 sections, 10 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of dynamic topic modeling. Every time slice (year here) has certain latent topics, interpreted as related words. Each topic evolves across time slices.
  • Figure 2: t-SNE visualization with stars ($\bigstar$) as topic embeddings and circles ($\bullet$) as word embeddings. Their time slice annotations are omitted for brevity. While baseline DETM mingles all word embeddings together, our CFDTM properly groups and separates them by topic and time slice (See their top words in \ref{['fig_topic_examples']}).
  • Figure 3: Illustration of the generation of sequential documents (following VAE) and Evolution-Tracking Contrastive learning (ETC). For $\bm{\mathbf{\varphi}}^{(t)}_{k}$ (topic embedding of Topic#$k$ at slice $t$), ETC adaptively pulls it close to $\bm{\mathbf{\varphi}}^{(t-1)}_{k}$ by adjustable intensity hyperparameter $\lambda^{(t)}$, and pushes it away from $\bm{\mathbf{\varphi}}^{(t)}_{k'} (k' \! \neq \! k)$, for instance $k' = k - 1$ and $k + 1$ here.
  • Figure 4: Illustration of Unassociated Word Exclusion (UWE). Unassociated word set $\mathcal{V}_{\text{\scriptsize{UW}}}^{(t)}$ contains words in the top word set $\mathcal{V}_{\text{\scriptsize{top}}}^{(t)}$ but not in the vocabulary set $\mathcal{V}^{(t)}$. UWE pushes topic embedding $\bm{\mathbf{\varphi}}^{(t)}_{k}$ away from the embeddings of unassociated words, $\bm{\mathbf{w}}_{\mathrm{id}(x)}$ and $\bm{\mathbf{w}}_{\mathrm{id}(x')}$.
  • Figure 5: Case study. Top related words of discovered topics in 2007, 2012, and 2017 from the NeurIPS dataset.
  • ...and 3 more figures