Determination of the Number of Topics Intrinsically: Is It Possible?

Victor Bulatov; Vasiliy Alekseev; Konstantin Vorontsov

Determination of the Number of Topics Intrinsically: Is It Possible?

Victor Bulatov, Vasiliy Alekseev, Konstantin Vorontsov

TL;DR

The paper addresses the challenge of selecting the number of topics $T$ in topic models, arguing that intrinsic metrics do not reliably reflect corpus-intrinsic properties. It systematically evaluates a wide range of intrinsic quality metrics—perplexity, stability, diversity, clustering, information-theoretic criteria, entropy, lift, and top-tokens coherence—across multiple topic models and corpora, using held-out data and subsampling to assess robustness. The findings show that most intrinsic criteria are inconsistent and highly model-dependent, with only relatively simple measures like AIC, BIC, MDL, and Renyi offering somewhat more stable guidance, yet still failing to yield a single universal optimal $T$. The authors conclude that $T$ should be treated as a hyperparameter and urge development of robust modeling approaches (e.g., model architectures resilient to $T$, hierarchical or semi-supervised methods, or alternative strategies) to move beyond the current fixation on an intrinsic, corpus-specific topic count.

Abstract

The number of topics might be the most important parameter of a topic model. The topic modelling community has developed a set of various procedures to estimate the number of topics in a dataset, but there has not yet been a sufficiently complete comparison of existing practices. This study attempts to partially fill this gap by investigating the performance of various methods applied to several topic models on a number of publicly available corpora. Further analysis demonstrates that intrinsic methods are far from being reliable and accurate tools. The number of topics is shown to be a method- and a model-dependent quantity, as opposed to being an absolute property of a particular corpus. We conclude that other methods for dealing with this problem should be developed and suggest some promising directions for further research.

Determination of the Number of Topics Intrinsically: Is It Possible?

TL;DR

The paper addresses the challenge of selecting the number of topics

in topic models, arguing that intrinsic metrics do not reliably reflect corpus-intrinsic properties. It systematically evaluates a wide range of intrinsic quality metrics—perplexity, stability, diversity, clustering, information-theoretic criteria, entropy, lift, and top-tokens coherence—across multiple topic models and corpora, using held-out data and subsampling to assess robustness. The findings show that most intrinsic criteria are inconsistent and highly model-dependent, with only relatively simple measures like AIC, BIC, MDL, and Renyi offering somewhat more stable guidance, yet still failing to yield a single universal optimal

. The authors conclude that

should be treated as a hyperparameter and urge development of robust modeling approaches (e.g., model architectures resilient to

, hierarchical or semi-supervised methods, or alternative strategies) to move beyond the current fixation on an intrinsic, corpus-specific topic count.

Abstract

Paper Structure (18 sections, 2 equations, 5 figures, 3 tables)

This paper contains 18 sections, 2 equations, 5 figures, 3 tables.

Introduction
Related Work
Intrinsic Quality Metrics
Perplexity
Stability
Diversity and Sufficiency
Clustering
Information-Theoretic Criteria
Entropy
Lift
Top-Tokens Analysis
Methodology
Topic Models Studied
Corpora Used
Results and Discussion
...and 3 more sections

Figures (5)

Figure 1: The average coherence of each topics, $1 < T < 21$. The models depicted are LDA with symmetric prior, LDA with heuristic prior and sparse model 0.
Figure 2: Sparse MDL criterion for sparse models with different spasity hyperparameter values.
Figure 3: A set of quality metrics exploring various $T$ for PLSA, $1 < T < 21$. The metrics depicted are AIC, MDL that accounts for model sparsity, and cosine-based diversity (taken with a negative sign, so the minimum corresponds to the "best" value). We see all metrics agreeing with 7 being a reasonable value for $T$.
Figure 4: Comparision of holdout perplexity and train perplexity for LDA model. Similar behaviour was observed for all considered datasets.
Figure 5: Multiple maxima of the Lift metric for different types of topic models.

Determination of the Number of Topics Intrinsically: Is It Possible?

TL;DR

Abstract

Determination of the Number of Topics Intrinsically: Is It Possible?

Authors

TL;DR

Abstract

Table of Contents

Figures (5)