A Survey on Neural Topic Models: Methods, Applications, and Challenges

Xiaobao Wu; Thong Nguyen; Anh Tuan Luu

A Survey on Neural Topic Models: Methods, Applications, and Challenges

Xiaobao Wu, Thong Nguyen, Anh Tuan Luu

TL;DR

Topic modeling has evolved from traditional probabilistic methods to neural topic models (NTMs) that leverage deep nets for scalable topic discovery. The paper surveys NTMs categorized by network structure and adapted to diverse scenarios, including short texts, cross-lingual data, and dynamic settings, and reviews applications in text analysis, generation, and recommendations. It also discusses challenges such as unreliable evaluation, topic quality, and hyperparameter sensitivity, and introduces a topic semantic aware diversity metric to align diversity with word meaning. The work provides a taxonomy, methodological guidance, and an open resource repository to accelerate NTM research.

Abstract

Topic models have been prevalent for decades to discover latent topics and infer topic proportions of documents in an unsupervised fashion. They have been widely used in various applications like text analysis and context recommendation. Recently, the rise of neural networks has facilitated the emergence of a new research field -- Neural Topic Models (NTMs). Different from conventional topic models, NTMs directly optimize parameters without requiring model-specific derivations. This endows NTMs with better scalability and flexibility, resulting in significant research attention and plentiful new methods and applications. In this paper, we present a comprehensive survey on neural topic models concerning methods, applications, and challenges. Specifically, we systematically organize current NTM methods according to their network structures and introduce the NTMs for various scenarios like short texts and bilingual documents. We also discuss a wide range of popular applications built on NTMs. Finally, we highlight the challenges confronted by NTMs to inspire future research. We accompany this survey with a repository for easier access to the mentioned paper resources: https://github.com/bobxwu/Paper-Neural-Topic-Models.

A Survey on Neural Topic Models: Methods, Applications, and Challenges

TL;DR

Abstract

Paper Structure (47 sections, 12 equations, 6 figures, 3 tables)

This paper contains 47 sections, 12 equations, 6 figures, 3 tables.

Introduction
Preliminary of Topic Models
Problem Setting and Notations
Evaluation of Topic Models
Perplexity
Topic Coherence
Topic Diversity
Downstream Task Performance
Visualization
Basic NTM based on VAE
NTMs with Different Structures
NTMs with Various Priors
NTMs with Embeddings
NTMs with Metadata
NTMs with Graph Neural Networks
...and 32 more sections

Figures (6)

Figure 1: The overview of this survey: NTMs with different structures, NTMs for various scenarios, applications of NTMs, and challenges of NTMs.
Figure 2: Illustration of topic modeling. Given a document collection, a topic model aims to discover $K$ latent topics, interpreted as distributions over words (topic-word distributions). It also infers topic proportions of each document (what topics a document contains), defined as distributions over all latent topics (doc-topic distributions). Here the topic-word distribution of Topic#$k$, $\boldsymbol{\mathbf{\beta}}_{k}$, has related words like "movie", "film", and "oscar"; the doc-topic distribution $\boldsymbol{\mathbf{\theta}}$ concentrates on Topic#1 and Topic#$k$.
Figure 3: Illustration of a VAE-based NTM. It mainly contains an encoder (inference network) and a decoder (generation network). The encoder outputs doc-topic distribution $\boldsymbol{\mathbf{\theta}}$ from input document $\boldsymbol{\mathbf{x}}$ through MLPs using the reparameterization trick where $\boldsymbol{\mathbf{\epsilon}} \sim \mathcal{N}(\boldsymbol{\mathbf{0}}, \boldsymbol{\mathbf{I}})$. The decoder reconstructs the input document from $\boldsymbol{\mathbf{\theta}}$ with $\boldsymbol{\mathbf{\beta}}$ as the topic-word distribution matrix. The objective includes reconstruction error and KL divergence.
Figure 4: Illustration of hierarchical topic modeling. Topics at each level cover different semantic granularity: child topics are more specific to parent topics. Topic#2-1 denotes the first topic of the second layer in the topic structure.
Figure 5: Illustration of cross-lingual topic modeling (on English and Chinese documents). Corresponding cross-lingual topics are required to be aligned, like English and Chinese Topic#3, and English and Chinese Topic#5. Words in the brackets are the English translations.
...and 1 more figures

A Survey on Neural Topic Models: Methods, Applications, and Challenges

TL;DR

Abstract

A Survey on Neural Topic Models: Methods, Applications, and Challenges

Authors

TL;DR

Abstract

Table of Contents

Figures (6)