Nonnegative Matrix Factorization through Cone Collapse
Manh Nguyen, Daniel Pimentel-Alarcón
TL;DR
This work reframes nonnegative matrix factorization through the geometry of convex cones, introducing Cone Collapse to recover the minimal generating cone of the data. The recovered extreme-ray basis is then refined with a uni-orthogonal NMF to yield CC--NMF, a cone-aware ONMF approach whose H rows act as orthogonal cluster indicators. The authors prove finite termination and exact cone recovery under mild assumptions and demonstrate robust clustering performance across 16 diverse datasets, often outperforming strong NMF baselines. The approach highlights how explicit conic geometry can improve interpretability and accuracy in NMF-based clustering with potential for broader applicability.
Abstract
Nonnegative matrix factorization (NMF) is a widely used tool for learning parts-based, low-dimensional representations of nonnegative data, with applications in vision, text, and bioinformatics. In clustering applications, orthogonal NMF (ONMF) variants further impose (approximate) orthogonality on the representation matrix so that its rows behave like soft cluster indicators. Existing algorithms, however, are typically derived from optimization viewpoints and do not explicitly exploit the conic geometry induced by NMF: data points lie in a convex cone whose extreme rays encode fundamental directions or "topics". In this work we revisit NMF from this geometric perspective and propose Cone Collapse, an algorithm that starts from the full nonnegative orthant and iteratively shrinks it toward the minimal cone generated by the data. We prove that, under mild assumptions on the data, Cone Collapse terminates in finitely many steps and recovers the minimal generating cone of $\mathbf{X}^\top$ . Building on this basis, we then derive a cone-aware orthogonal NMF model (CC-NMF) by applying uni-orthogonal NMF to the recovered extreme rays. Across 16 benchmark gene-expression, text, and image datasets, CC-NMF consistently matches or outperforms strong NMF baselines-including multiplicative updates, ANLS, projective NMF, ONMF, and sparse NMF-in terms of clustering purity. These results demonstrate that explicitly recovering the data cone can yield both theoretically grounded and empirically strong NMF-based clustering methods.
