A Survey on Neural Topic Models: Methods, Applications, and Challenges
Xiaobao Wu, Thong Nguyen, Anh Tuan Luu
TL;DR
Topic modeling has evolved from traditional probabilistic methods to neural topic models (NTMs) that leverage deep nets for scalable topic discovery. The paper surveys NTMs categorized by network structure and adapted to diverse scenarios, including short texts, cross-lingual data, and dynamic settings, and reviews applications in text analysis, generation, and recommendations. It also discusses challenges such as unreliable evaluation, topic quality, and hyperparameter sensitivity, and introduces a topic semantic aware diversity metric to align diversity with word meaning. The work provides a taxonomy, methodological guidance, and an open resource repository to accelerate NTM research.
Abstract
Topic models have been prevalent for decades to discover latent topics and infer topic proportions of documents in an unsupervised fashion. They have been widely used in various applications like text analysis and context recommendation. Recently, the rise of neural networks has facilitated the emergence of a new research field -- Neural Topic Models (NTMs). Different from conventional topic models, NTMs directly optimize parameters without requiring model-specific derivations. This endows NTMs with better scalability and flexibility, resulting in significant research attention and plentiful new methods and applications. In this paper, we present a comprehensive survey on neural topic models concerning methods, applications, and challenges. Specifically, we systematically organize current NTM methods according to their network structures and introduce the NTMs for various scenarios like short texts and bilingual documents. We also discuss a wide range of popular applications built on NTMs. Finally, we highlight the challenges confronted by NTMs to inspire future research. We accompany this survey with a repository for easier access to the mentioned paper resources: https://github.com/bobxwu/Paper-Neural-Topic-Models.
