Majorization-minimization for Sparse Nonnegative Matrix Factorization with the $β$-divergence

Arthur Marmin; José Henrique de Morais Goulart; Cédric Févotte

Majorization-minimization for Sparse Nonnegative Matrix Factorization with the $β$-divergence

Arthur Marmin, José Henrique de Morais Goulart, Cédric Févotte

TL;DR

The approach leverages a reparametrization of the original problem into the optimization of an equivalent scale-invariant objective function and derives block-descent majorization-minimization algorithms that result in simple multiplicative updates for either <inline-formula><tex-math notation="LaTeX">-regularization or the more “aggressive” log- regularization.

Abstract

This article introduces new multiplicative updates for nonnegative matrix factorization with the $β$-divergence and sparse regularization of one of the two factors (say, the activation matrix). It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation. Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem. Our approach leverages a reparametrization of the original problem into the optimization of an equivalent scale-invariant objective function. From there, we derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $\ell_{1}$-regularization or the more "aggressive" log-regularization. In contrast with other state-of-the-art methods, our algorithms are universal in the sense that they can be applied to any $β$-divergence (i.e., any value of $β$) and that they come with convergence guarantees. We report numerical comparisons with existing heuristic and Lagrangian methods using various datasets: face images, an audio spectrogram, hyperspectral data, and song play counts. We show that our methods obtain solutions of similar quality at convergence (similar objective values) but with significantly reduced CPU times.

Majorization-minimization for Sparse Nonnegative Matrix Factorization with the $β$-divergence

TL;DR

Abstract

This article introduces new multiplicative updates for nonnegative matrix factorization with the

-divergence and sparse regularization of one of the two factors (say, the activation matrix). It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation. Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem. Our approach leverages a reparametrization of the original problem into the optimization of an equivalent scale-invariant objective function. From there, we derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either

-regularization or the more "aggressive" log-regularization. In contrast with other state-of-the-art methods, our algorithms are universal in the sense that they can be applied to any

-divergence (i.e., any value of

) and that they come with convergence guarantees. We report numerical comparisons with existing heuristic and Lagrangian methods using various datasets: face images, an audio spectrogram, hyperspectral data, and song play counts. We show that our methods obtain solutions of similar quality at convergence (similar objective values) but with significantly reduced CPU times.

Paper Structure (38 sections, 3 theorems, 36 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 38 sections, 3 theorems, 36 equations, 7 figures, 4 tables, 2 algorithms.

Introduction
State of the art
Contributions
Outline
Notation
NMF with $\beta$-divergence and $\ell_{1}$ regularization
Objective
Necessity of the constrained formulation
State of the art
Lagrangian method
Heuristic multiplicative updates
A unified block-descent MM algorithm for $\beta$-NMF with $\ell_{1}$ regularization
Equivalent scale-invariant objective function
Reformulation without norm constraints
Symmetry of the roles of $\mathbf{W}$ and $\mathbf{H}$
...and 23 more sections

Key Result

Lemma 1

Let $(\mathbf{W}^{*}, \mathbf{H}^{*}) \ge 0$ be a solution of Problem eq:pb3. Let us define their renormalized equivalents by $\bar{\mathbf{W}}^{*} = \mathbf{W}^{*} \bm{\Lambda}^{*-1}$ and $\bar{\mathbf{H}}^{*} = \bm{\Lambda}^{*} \mathbf{H}^{*}$ where $\bm{\Lambda}^{*}=\mathop{\mathrm{Diag}}\nolimit

Figures (7)

Figure 1: Values of the normalized objective function through the first hundred of iterations. Results obtained on synthetic data matrix $\mathbf{V}$ with parameters $(F,N)=(50,40)$, $K=3$, $\beta=-0.5$.
Figure 2: Comparative performance with Olivetti and TasteProfile datasets using the $\ell_{1}$-regularization ($\beta=1$).
Figure 3: Comparative performance with a spectrogram using the $\ell_{1}$-regularization.
Figure 4: Comparative performance with Moffett dataset using the $\ell_{1}$-regularization.
Figure 5: Comparative performance with Olivetti and TasteProfile datasets using the log-regularization ($\beta=1$).
...and 2 more figures

Theorems & Definitions (6)

Lemma 1
proof
Lemma 2
proof
Theorem 1
proof

Majorization-minimization for Sparse Nonnegative Matrix Factorization with the $β$-divergence

TL;DR

Abstract

Majorization-minimization for Sparse Nonnegative Matrix Factorization with the $β$-divergence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)