Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion
Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu
TL;DR
Dynamic topic models often suffer from repetitive topics within a time slice and unassociated topics across time. We introduce CFDTM, a chain-free neural dynamic topic model that combines Evolution-Tracking Contrastive Learning (ETC) and Unassociated Word Exclusion (UWE) to simultaneously track topic evolution and prune irrelevant words. Topics are represented as embeddings $\boldsymbol{\varphi}^{(t)}_k$ with a distance-based beta distribution $\beta^{(t)}_{k,i}$, and sequential documents are generated via a VAE-like ELBO $\mathcal{L}_{\text{TM}}$, augmented by $\mathcal{L}_{\text{ETC}}$ and $\lambda_{\text{UWE}} \mathcal{L}_{\text{UWE}}$ to form the overall objective. Empirical results on benchmark datasets show higher topic coherence $C_V$ and diversity across time, stronger downstream classification/clustering performance, and robustness to the evolution intensity hyperparameter $\lambda^{(t)}$. The approach yields clearer topic evolution and improved interpretability, with code available at the project repository.
Abstract
Dynamic topic models track the evolution of topics in sequential documents, which have derived various applications like trend analysis and opinion mining. However, existing models suffer from repetitive topic and unassociated topic issues, failing to reveal the evolution and hindering further applications. To address these issues, we break the tradition of simply chaining topics in existing work and propose a novel neural \modelfullname. We introduce a new evolution-tracking contrastive learning method that builds the similarity relations among dynamic topics. This not only tracks topic evolution but also maintains topic diversity, mitigating the repetitive topic issue. To avoid unassociated topics, we further present an unassociated word exclusion method that consistently excludes unassociated words from discovered topics. Extensive experiments demonstrate our model significantly outperforms state-of-the-art baselines, tracking topic evolution with high-quality topics, showing better performance on downstream tasks, and remaining robust to the hyperparameter for evolution intensities. Our code is available at https://github.com/bobxwu/CFDTM .
