Table of Contents
Fetching ...

Fast exact recovery of noisy matrix from few entries: the infinity norm approach

BaoLinh Tran, Van Vu

TL;DR

This work addresses exact recovery of a low-rank matrix from a small, randomly sampled set of entries in the presence of noise, under only the three basic assumptions: low rank, incoherence, and sufficient sampling density. It introduces a simple, fast algorithm based on truncated SVD with rounding, and pairs it with a novel contour-integration based analysis to obtain infinity-norm guarantees, removing prior spectral-gap and condition-number requirements. A new infinity-norm perturbation theorem (Davis–Kahan–Wedin type) and semi-isotropic bounds are developed, enabling deterministic and then probabilistic recovery results. The paper delivers a near-linear-time recovery procedure with explicit sampling-density conditions that guarantee exact recovery with high probability, thus offering a practical and theoretically sharp alternative to existing spectral-methods in noisy matrix completion.

Abstract

The matrix recovery (completion) problem, a central problem in data science and theoretical computer science, is to recover a matrix $A$ from a relatively small sample of entries. While such a task is impossible in general, it has been shown that one can recover $A$ exactly in polynomial time, with high probability, from a random subset of entries, under three (basic and necessary) assumptions: (1) the rank of $A$ is very small compared to its dimensions (low rank), (2) $A$ has delocalized singular vectors (incoherence), and (3) the sample size is sufficiently large. There are many different algorithms for the task, including convex optimization by Candes, Tao and Recht (2009), alternating projection by Hardt and Wooters (2014) and low rank approximation with gradient descent by Keshavan, Montanari and Oh (2009, 2010). In applications, it is more realistic to assume that data is noisy. In this case, these approaches provide an approximate recovery with small root mean square error. However, it is hard to transform such an approximate recovery to an exact one. Recently, results by Abbe et al. (2017) and Bhardwaj et al. (2023) concerning approximation in the infinity norm showed that we can achieve exact recovery even in the noisy case, given that the ground matrix has bounded precision. Beyond the three basic assumptions above, they required either the condition number of $A$ is small (Abbe et al.) or the gap between consecutive singular values is large (Bhardwaj et al.). In this paper, we remove these extra spectral assumptions. As a result, we obtain a simple algorithm for exact recovery in the noisy case, under only the three basic assumptions. This is the first such algorithm. To analyse this algorithm, we introduce a contour integration argument which is totally different from all previous methods and may be of independent interest.

Fast exact recovery of noisy matrix from few entries: the infinity norm approach

TL;DR

This work addresses exact recovery of a low-rank matrix from a small, randomly sampled set of entries in the presence of noise, under only the three basic assumptions: low rank, incoherence, and sufficient sampling density. It introduces a simple, fast algorithm based on truncated SVD with rounding, and pairs it with a novel contour-integration based analysis to obtain infinity-norm guarantees, removing prior spectral-gap and condition-number requirements. A new infinity-norm perturbation theorem (Davis–Kahan–Wedin type) and semi-isotropic bounds are developed, enabling deterministic and then probabilistic recovery results. The paper delivers a near-linear-time recovery procedure with explicit sampling-density conditions that guarantee exact recovery with high probability, thus offering a practical and theoretically sharp alternative to existing spectral-methods in noisy matrix completion.

Abstract

The matrix recovery (completion) problem, a central problem in data science and theoretical computer science, is to recover a matrix from a relatively small sample of entries. While such a task is impossible in general, it has been shown that one can recover exactly in polynomial time, with high probability, from a random subset of entries, under three (basic and necessary) assumptions: (1) the rank of is very small compared to its dimensions (low rank), (2) has delocalized singular vectors (incoherence), and (3) the sample size is sufficiently large. There are many different algorithms for the task, including convex optimization by Candes, Tao and Recht (2009), alternating projection by Hardt and Wooters (2014) and low rank approximation with gradient descent by Keshavan, Montanari and Oh (2009, 2010). In applications, it is more realistic to assume that data is noisy. In this case, these approaches provide an approximate recovery with small root mean square error. However, it is hard to transform such an approximate recovery to an exact one. Recently, results by Abbe et al. (2017) and Bhardwaj et al. (2023) concerning approximation in the infinity norm showed that we can achieve exact recovery even in the noisy case, given that the ground matrix has bounded precision. Beyond the three basic assumptions above, they required either the condition number of is small (Abbe et al.) or the gap between consecutive singular values is large (Bhardwaj et al.). In this paper, we remove these extra spectral assumptions. As a result, we obtain a simple algorithm for exact recovery in the noisy case, under only the three basic assumptions. This is the first such algorithm. To analyse this algorithm, we introduce a contour integration argument which is totally different from all previous methods and may be of independent interest.

Paper Structure

This paper contains 32 sections, 15 theorems, 243 equations, 1 figure, 2 algorithms.

Key Result

Theorem 1.5

There is a universal constant $C > 0$ such that the following holds. Suppose $r_{\max} \le \log^2 N$. Under the model set:matrix-completion, assume the following: Then with probability $1 - O(N^{-1})$, the first three steps of algo:matrix-completion recovers every entry of $A$ within an absolute error ${\varepsilon}_0/3$. Consequently, if all entries are multiples integer of ${\varepsilon}_0$, th

Figures (1)

  • Figure 1: The Yale face database wainwrightbook2019 has $165$ greyscale face images of dimension $243 \times 320$, which can be turned into vectors in $\mathbb{R}^{77760}$ and arranged onto a matrix in $\mathbb{R}^{77760 \times 165}$. Center this matrix by subtracting from every column their average. Shown above are the first 30 singular values and first 30 consecutive singular value gaps of this centered matrix. The picture on the left shows that matrix has a small numer of large eigenvalues (about 30) and the rest looks insignificant. On the other hand, the picture on the right shows that several eigenvalue gaps between the significant ones are quite small.

Theorems & Definitions (42)

  • Remark 1.1
  • Theorem 1.5
  • Remark 1.6: Density bound
  • Remark 1.7: Quadratic growth in $1/{\varepsilon}_0$
  • Remark 1.8: Bound on $r_{\max}$
  • Theorem 2.1
  • Remark 2.2
  • proof
  • Theorem 3.2
  • Theorem 3.3
  • ...and 32 more