Randomized Algorithms for Symmetric Nonnegative Matrix Factorization
Koby Hayashi, Sinan G. Aksoy, Grey Ballard, Haesun Park
TL;DR
This work introduces two randomized approaches for Symmetric Nonnegative Matrix Factorization (SymNMF): LAI-SymNMF, which accelerates computation by first forming a low-rank, symmetric input approximation before solving the factorization, and LvS-SymNMF, which uses leverage-score based sampling to accelerate the sequence of nonnegative least squares subproblems. It provides theoretical guarantees that leverage-score sampling yields provable accuracy for convex least squares problems, including nonnegative least squares, and analyzes a hybrid sampling strategy that deterministically includes high-leverage rows to boost practical performance. Empirically, the methods deliver substantial speedups on large dense and sparse graphs (often 5x–7.5x) while preserving clustering quality, demonstrated on Web of Science and Microsoft Open Academic Graph datasets. The framework unifies randomized linear algebra tools with both alternating-update (ANLS/HALS) and all-at-once (PGNCG) SymNMF solvers, offering a versatile path toward scalable SymNMF for big-data clustering and information fusion tasks. Overall, the paper advances scalable SymNMF by (i) introducing robust randomized input sketching and sampling schemes, (ii) providing rigorous error and complexity analyses, and (iii) validating practical gains on real-world graphs.
Abstract
Symmetric Nonnegative Matrix Factorization (SymNMF) is a technique in data analysis and machine learning that approximates a symmetric matrix with a product of a nonnegative, low-rank matrix and its transpose. To design faster and more scalable algorithms for SymNMF we develop two randomized algorithms for its computation. The first algorithm uses randomized matrix sketching to compute an initial low-rank approximation to the input matrix and proceeds to rapidly compute a SymNMF of the approximation. The second algorithm uses randomized leverage score sampling to approximately solve constrained least squares problems. Many successful methods for SymNMF rely on (approximately) solving sequences of constrained least squares problems. We prove theoretically that leverage score sampling can approximately solve nonnegative least squares problems to a chosen accuracy with high probability. Additionally, we prove sampling complexity results for previously proposed hybrid sampling techniques which deterministically include high leverage score rows. This hybrid scheme is crucial for obtaining speeds ups in practice. Finally we demonstrate that both methods work well in practice by applying them to graph clustering tasks on large real world data sets. These experiments show that our methods approximately maintain solution quality and achieve significant speed ups for both large dense and large sparse problems.
