Reliability of Topic Modeling
Kayla Schroeder, Zach Wood-Doughty
TL;DR
This paper rethinks topic-model reliability by treating topic modeling as a measurement problem and demonstrating that the common practice of using a fixed cosine-threshold similarity to judge stability is inadequate. It surveys unidimensional reliability constructs and introduces three multidimensional measures—stratified alpha, multivariate omega, and maximal reliability—and shows that multivariate omega most effectively captures replication reliability across both synthetic and real datasets. Through extensive experiments with trivial and nontrivial synthetic data and a CFPB consumer-complaint corpus, it reveals that reliability generally declines as the number of topics increases and that the standard practice can misrepresent stability, whereas multivariate omega tracks sensitivity to word removals and topic-count changes. The work recommends adopting multivariate omega as a standard reliability tool in topic-model-based analyses to improve interpretability and robustness of downstream conclusions.
Abstract
Topic models allow researchers to extract latent factors from text data and use those variables in downstream statistical analyses. However, these methodologies can vary significantly due to initialization differences, randomness in sampling procedures, or noisy data. Reliability of these methods is of particular concern as many researchers treat learned topic models as ground truth for subsequent analyses. In this work, we show that the standard practice for quantifying topic model reliability fails to capture essential aspects of the variation in two widely-used topic models. Drawing from a extensive literature on measurement theory, we provide empirical and theoretical analyses of three other metrics for evaluating the reliability of topic models. On synthetic and real-world data, we show that McDonald's $ω$ provides the best encapsulation of reliability. This metric provides an essential tool for validation of topic model methodologies that should be a standard component of any topic model-based research.
