Table of Contents
Fetching ...

Negative Binomial Matrix Completion

Yu Lu, Kevin Bui, Roummel F. Marcia

TL;DR

This work addresses matrix completion for count data with overdispersion by modeling observation noise with a negative binomial distribution and employing a nuclear-norm regularized MAP objective. The proposed NB matrix completion is solved via proximal gradient descent with singular-value thresholding, deriving the NB-specific gradient and leveraging an efficient proximal operator. Empirical results across bike-sharing, vehicle traffic, and microscopy datasets show NB matrix completion outperforming Poisson matrix completion under NB noise, while remaining competitive when data are Poisson-like (large dispersion parameter). Overall, the approach provides a robust tool for recovering low-rank count matrices in realistic noisy and missing-data settings, with practical implications for imaging, traffic analysis, and related domains.

Abstract

Matrix completion focuses on recovering missing or incomplete information in matrices. This problem arises in various applications, including image processing and network analysis. Previous research proposed Poisson matrix completion for count data with noise that follows a Poisson distribution, which assumes that the mean and variance are equal. Since overdispersed count data, whose variance is greater than the mean, is more likely to occur in realistic settings, we assume that the noise follows the negative binomial (NB) distribution, which can be more general than the Poisson distribution. In this paper, we introduce NB matrix completion by proposing a nuclear-norm regularized model that can be solved by proximal gradient descent. In our experiments, we demonstrate that the NB model outperforms Poisson matrix completion in various noise and missing data settings on real data.

Negative Binomial Matrix Completion

TL;DR

This work addresses matrix completion for count data with overdispersion by modeling observation noise with a negative binomial distribution and employing a nuclear-norm regularized MAP objective. The proposed NB matrix completion is solved via proximal gradient descent with singular-value thresholding, deriving the NB-specific gradient and leveraging an efficient proximal operator. Empirical results across bike-sharing, vehicle traffic, and microscopy datasets show NB matrix completion outperforming Poisson matrix completion under NB noise, while remaining competitive when data are Poisson-like (large dispersion parameter). Overall, the approach provides a robust tool for recovering low-rank count matrices in realistic noisy and missing-data settings, with practical implications for imaging, traffic analysis, and related domains.

Abstract

Matrix completion focuses on recovering missing or incomplete information in matrices. This problem arises in various applications, including image processing and network analysis. Previous research proposed Poisson matrix completion for count data with noise that follows a Poisson distribution, which assumes that the mean and variance are equal. Since overdispersed count data, whose variance is greater than the mean, is more likely to occur in realistic settings, we assume that the noise follows the negative binomial (NB) distribution, which can be more general than the Poisson distribution. In this paper, we introduce NB matrix completion by proposing a nuclear-norm regularized model that can be solved by proximal gradient descent. In our experiments, we demonstrate that the NB model outperforms Poisson matrix completion in various noise and missing data settings on real data.
Paper Structure (7 sections, 1 theorem, 18 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 7 sections, 1 theorem, 18 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

Consider the singular value decomposition of $Z\in \mathbb{R}^{m \times n}$ of rank $l$ as follows: where $U \in \mathbb{R}^{m \times l}$ and $V \in \mathbb{R}^{n \times l}$ are matrices with orthonormal columns. For $\lambda > 0$, let where $(\sigma -\lambda)_+ = \max(0,\sigma-\lambda)$. Then

Figures (4)

  • Figure 1: Hourly inbound traffic data at Robert F. Kennedy Bridge Queens/Bronx Plaza from January 2021 to March 2021.
  • Figure 2: Experiment 2 (Poisson) results. (a) Traffic data (Fig. \ref{['fig:original']}) corrupted with Poisson noise and only 25% of its entries are known. (b) Poisson reconstruction with PSNR $= 22.99$ and RMSE $= 7.09\%$. (c) NB reconstruction with PSNR $= 22.97$ and RMSE $= 7.11\%$.
  • Figure 3: Experiment 2 (NB) results. (a) Traffic data (Fig. \ref{['fig:original']}) corrupted with NB ($r=10$) noise and only 25% of its entries are known. (b) Poisson reconstruction with PSNR $= 21.44$ and RMSE $= 8.47\%$. (c) NB reconstruction with PSNR $= 23.97$ and RMSE $= 6.37\%$.
  • Figure 4: Results on the microscopy image of mouse brain tissues from zhang2019Poisson. (a) Low-rank image of the original. (b) Incomplete, noisy image with NB noise $r=10$ NB noise level and $q=75\%$ data-known level. (c) Poisson reconstruction with PSNR $= 22.36$ and NRMSE $= 7.62\%$. (d) NB model reconstruction with PSNR $= 29.08$ and RMSE $= 3.51\%$.

Theorems & Definitions (1)

  • Lemma 1: Theorem 2.1, cai2010singular