Table of Contents
Fetching ...

Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints

Salar Basiri, Alisina Bayati, Srinivasa Salapaka

TL;DR

This work tackles Orthogonal Nonnegative Matrix Factorization with sparsity constraints (SCONMF) by reformulating it as a capacity-constrained facility location problem (CCFLP). It combines a maximum-entropy principle–based deterministic annealing (MEP-DA) approach with a control barrier function (CBF) framework to enforce nonnegativity, orthogonality, and per-row sparsity, and to enable automatic determination of the true feature count. The proposed method, MEP-ONMF, demonstrates superior reconstruction accuracy, strict constraint satisfaction, and robust rank estimation on synthetic and standard bioinformatics data, outperforming existing ONMF methods. This framework provides a scalable, constraint-satisfying, and interpretable factorization method suitable for large-scale nonnegative data analysis and feature discovery (metagenes).

Abstract

This article presents a novel approach to solving the sparsity-constrained Orthogonal Nonnegative Matrix Factorization (SCONMF) problem, which requires decomposing a non-negative data matrix into the product of two lower-rank non-negative matrices, X=WH, where the mixing matrix H has orthogonal rows HH^T=I, while also satisfying an upper bound on the number of nonzero elements in each row. By reformulating SCONMF as a capacity-constrained facility-location problem (CCFLP), the proposed method naturally integrates non-negativity, orthogonality, and sparsity constraints. Specifically, our approach integrates control-barrier function (CBF) based framework used for dynamic optimal control design problems with maximum-entropy-principle-based framework used for facility location problems to enforce these constraints while ensuring robust factorization. Additionally, this work introduces a quantitative approach for determining the ``true" rank of W or H, equivalent to the number of ``true" features - a critical aspect in ONMF applications where the number of features is unknown. Simulations on various datasets demonstrate significantly improved factorizations with low reconstruction errors (as small as by 150 times) while strictly satisfying all constraints, outperforming existing methods that struggle with balancing accuracy and constraint adherence.

Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints

TL;DR

This work tackles Orthogonal Nonnegative Matrix Factorization with sparsity constraints (SCONMF) by reformulating it as a capacity-constrained facility location problem (CCFLP). It combines a maximum-entropy principle–based deterministic annealing (MEP-DA) approach with a control barrier function (CBF) framework to enforce nonnegativity, orthogonality, and per-row sparsity, and to enable automatic determination of the true feature count. The proposed method, MEP-ONMF, demonstrates superior reconstruction accuracy, strict constraint satisfaction, and robust rank estimation on synthetic and standard bioinformatics data, outperforming existing ONMF methods. This framework provides a scalable, constraint-satisfying, and interpretable factorization method suitable for large-scale nonnegative data analysis and feature discovery (metagenes).

Abstract

This article presents a novel approach to solving the sparsity-constrained Orthogonal Nonnegative Matrix Factorization (SCONMF) problem, which requires decomposing a non-negative data matrix into the product of two lower-rank non-negative matrices, X=WH, where the mixing matrix H has orthogonal rows HH^T=I, while also satisfying an upper bound on the number of nonzero elements in each row. By reformulating SCONMF as a capacity-constrained facility-location problem (CCFLP), the proposed method naturally integrates non-negativity, orthogonality, and sparsity constraints. Specifically, our approach integrates control-barrier function (CBF) based framework used for dynamic optimal control design problems with maximum-entropy-principle-based framework used for facility location problems to enforce these constraints while ensuring robust factorization. Additionally, this work introduces a quantitative approach for determining the ``true" rank of W or H, equivalent to the number of ``true" features - a critical aspect in ONMF applications where the number of features is unknown. Simulations on various datasets demonstrate significantly improved factorizations with low reconstruction errors (as small as by 150 times) while strictly satisfying all constraints, outperforming existing methods that struggle with balancing accuracy and constraint adherence.
Paper Structure (11 sections, 2 theorems, 13 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 2 theorems, 13 equations, 2 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

The ONMF problem (relaxed) can be interpreted as a CCFLP problem (FLP formulation) if $D(\cdot,\cdot)$ is taken to be the squared Frobenius norm $||\cdot||_F^2$.

Figures (2)

  • Figure 1: The logarithm of the fraction between successive critical $\beta$s over all time periods. The $k^{th}$ value on the x-axis represents the number of features, while The y-axis shows log($\frac{\beta_{k+1}}{\beta_k}$) where $\beta_k$ is the critical $\beta$ value at which the $k^{th}$ feature split happens.
  • Figure 2: Orthogonal features extracted using MEP-ONMF across seven time periods of the standard bioinformatics dataset. The top row represents the unconstrained case, while the bottom row corresponds to the constrained setting ($\bar{c}_j/n \leq 0.35$ for all $j$). The y-axis denotes genes, and the x-axis represents metagenes.

Theorems & Definitions (5)

  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof