Non-negative matrix factorization algorithms generally improve topic model fits
Peter Carbonetto, Abhishek Sarkar, Zihao Wang, Matthew Stephens
TL;DR
The paper addresses efficient maximum-likelihood estimation for count-based topic representations by exploiting the equivalence between Poisson NMF and a multinomial topic framework. It formalizes a PNMF-to-MTM mapping and shows that fast Poisson-NMF optimization, particularly coordinate descent with extrapolation, can outperform traditional EM-based fitting while yielding better parameter estimates. The authors implement these methods in the fastTopics R package and demonstrate substantial speedups and improved fits on both text and single-cell datasets. They conclude that fitting Poisson NMF and then recovering the topic representation provides a practical, scalable approach for large-scale count-based topic analyses.
Abstract
In an effort to develop topic modeling methods that can be quickly applied to large data sets, we revisit the problem of maximum-likelihood estimation in topic models. It is known, at least informally, that maximum-likelihood estimation in topic models is closely related to non-negative matrix factorization (NMF). Yet, to our knowledge, this relationship has not been exploited previously to fit topic models. We show that recent advances in NMF optimization methods can be leveraged to fit topic models very efficiently, often resulting in much better fits and in less time than existing algorithms for topic models. We also formally make the connection between the NMF optimization problem and maximum-likelihood estimation for the topic model, and using this result we show that the expectation maximization (EM) algorithm for the topic model is essentially the same as the classic multiplicative updates for NMF (the only difference being that the operations are performed in a different order). Our methods are implemented in the R package fastTopics.
