Table of Contents
Fetching ...

On the Connection Between Non-negative Matrix Factorization and Latent Dirichlet Allocation

Benedikt Geiger, Peter J. Park

TL;DR

This work investigates how non-negative matrix factorization (NMF) with the generalized KL divergence relates to topic models such as PLSA and LDA. It shows that enforcing column-normalization on the factor matrices and introducing a Dirichlet prior on the topic-proportion matrix yields exact algorithmic equivalences: NMF with KL loss becomes LDA, and NMF with normalization only on W aligns with PLSA. The authors derive joint multiplicative updates that update both factor matrices simultaneously, reducing computational cost, and analyze sparse variants showing that a straight \\ell_1 penalty on H does not reliably induce sparsity under KL loss. By unifying the optimization and probabilistic perspectives, the paper provides a deeper, more versatile understanding of how NMF and topic models interrelate, with implications for many domains in unsupervised learning and text mining.

Abstract

Non-negative matrix factorization with the generalized Kullback-Leibler divergence (NMF) and latent Dirichlet allocation (LDA) are two popular approaches for dimensionality reduction of non-negative data. Here, we show that NMF with $\ell_1$ normalization constraints on the columns of both matrices of the decomposition and a Dirichlet prior on the columns of one matrix is equivalent to LDA. To show this, we demonstrate that explicitly accounting for the scaling ambiguity of NMF by adding $\ell_1$ normalization constraints to the optimization problem allows a joint update of both matrices in the widely used multiplicative updates (MU) algorithm. When both of the matrices are normalized, the joint MU algorithm leads to probabilistic latent semantic analysis (PLSA), which is LDA without a Dirichlet prior. Our approach of deriving joint updates for NMF also reveals that a Lasso penalty on one matrix together with an $\ell_1$ normalization constraint on the other matrix is insufficient to induce any sparsity.

On the Connection Between Non-negative Matrix Factorization and Latent Dirichlet Allocation

TL;DR

This work investigates how non-negative matrix factorization (NMF) with the generalized KL divergence relates to topic models such as PLSA and LDA. It shows that enforcing column-normalization on the factor matrices and introducing a Dirichlet prior on the topic-proportion matrix yields exact algorithmic equivalences: NMF with KL loss becomes LDA, and NMF with normalization only on W aligns with PLSA. The authors derive joint multiplicative updates that update both factor matrices simultaneously, reducing computational cost, and analyze sparse variants showing that a straight \\ell_1 penalty on H does not reliably induce sparsity under KL loss. By unifying the optimization and probabilistic perspectives, the paper provides a deeper, more versatile understanding of how NMF and topic models interrelate, with implications for many domains in unsupervised learning and text mining.

Abstract

Non-negative matrix factorization with the generalized Kullback-Leibler divergence (NMF) and latent Dirichlet allocation (LDA) are two popular approaches for dimensionality reduction of non-negative data. Here, we show that NMF with normalization constraints on the columns of both matrices of the decomposition and a Dirichlet prior on the columns of one matrix is equivalent to LDA. To show this, we demonstrate that explicitly accounting for the scaling ambiguity of NMF by adding normalization constraints to the optimization problem allows a joint update of both matrices in the widely used multiplicative updates (MU) algorithm. When both of the matrices are normalized, the joint MU algorithm leads to probabilistic latent semantic analysis (PLSA), which is LDA without a Dirichlet prior. Our approach of deriving joint updates for NMF also reveals that a Lasso penalty on one matrix together with an normalization constraint on the other matrix is insufficient to induce any sparsity.
Paper Structure (17 sections, 18 theorems, 88 equations, 7 algorithms)

This paper contains 17 sections, 18 theorems, 88 equations, 7 algorithms.

Key Result

Lemma 4.1

Let $(W, H)$ be a solution of the standard NMF optimization problem optimization_problem_nmf and let $\lambda_k = \sum_v w_{vk}$. Then $(\widetilde{W}, \widetilde{H})$ with $\widetilde{w}_{vk} = w_{vk} / \lambda_k$ and $\widetilde{h}_{kd} = \lambda_k h_{kd}$ is a solution of NMF with a normalization

Theorems & Definitions (35)

  • Lemma 4.1
  • Lemma 4.2
  • Lemma 6.1
  • Lemma A.1: lee2000algorithms
  • proof
  • Lemma A.2
  • proof
  • Corollary A.3: lee2000algorithms
  • proof
  • Lemma 4.2
  • ...and 25 more