Tensor Decompositions for Count Data that Leverage Stochastic and Deterministic Optimization
Jeremy M. Myers, Daniel M. Dunlavy
TL;DR
The paper tackles Poisson CPD for count data by introducing two complementary strategies that blend stochastic and deterministic optimization to increase the probability of converging to the maximum likelihood estimator (MLE). Hybrid GCP-CPAPR (HybridGC) uses a stochastic GCP-Adam stage to rapidly approach a good basin, then refines with CPAPR to reach high accuracy, while Restarted CPAPR with SVDrop detects rank-deficient paths via mode unfoldings and restarts within the feasible domain to curb wasted computation. Empirical results on synthetic tensors show higher chances of converging to the empirical MLE and better alignment of algebraic structure (via FMS) with the MLE, at a moderate increase in computational cost. Overall, the work provides practical, spectroscopy-informed techniques to improve reliability and efficiency of Poisson CPD in large-scale, sparse count-data applications and offers guidance on parameter choices and diagnostic metrics.
Abstract
There is growing interest to extend low-rank matrix decompositions to multi-way arrays, or tensors. One fundamental low-rank tensor decomposition is the canonical polyadic decomposition (CPD). The challenge of fitting a low-rank, nonnegative CPD model to Poisson-distributed count data is of particular interest. Several popular algorithms use local search methods to approximate the maximum likelihood estimator (MLE) of the Poisson CPD model. This work presents two new algorithms that extend state-of-the-art local methods for Poisson CPD. Hybrid GCP-CPAPR combines Generalized Canonical Decomposition (GCP) with stochastic optimization and CP Alternating Poisson Regression (CPAPR), a deterministic algorithm, to increase the probability of converging to the MLE over either method used alone. Restarted CPAPR with SVDrop uses a heuristic based on the singular values of the CPD model unfoldings to identify convergence toward optimizers that are not the MLE and restarts within the feasible domain of the optimization problem, thus reducing overall computational cost when using a multi-start strategy. We provide empirical evidence that indicates our approaches outperform existing methods with respect to converging to the Poisson CPD MLE.
