Table of Contents
Fetching ...

Randomized Approach to Matrix Completion: Applications in Collaborative Filtering and Image Inpainting

Antonina Krajewska, Ewa Niewiadomska-Szynkiewicz

TL;DR

The paper tackles matrix completion for tall, incomplete matrices by introducing Columns Selected Matrix Completion (CSMC), a two-stage framework that first completes a reduced column-submatrix and then recovers the full matrix via a least-squares step. Theoretical guarantees show exact recovery with high probability under standard incoherence when the number of sampled columns satisfies $d = \mathcal{O}(r \log r)$ and the observed entries satisfy $|\Omega| = \mathcal{O}(n_2 r \log(n_2 r))$. Two scalable algorithms, CSNN and CSPGD, are proposed to accommodate different problem sizes, with implementations and open-source code provided. Empirical results on synthetic data, movie-rating datasets, and image inpainting demonstrate that CSMC achieves reconstruction quality comparable to state-of-the-art convex MC methods while significantly reducing computational runtime, highlighting its practical value for large-scale, imbalanced matrix problems.

Abstract

We present a novel method for matrix completion, specifically designed for matrices where one dimension significantly exceeds the other. Our Columns Selected Matrix Completion (CSMC) method combines Column Subset Selection and Low-Rank Matrix Completion to efficiently reconstruct incomplete datasets. In each step, CSMC solves a convex optimization problem. We introduce two algorithms to implement CSMC, each tailored to problems of different sizes. A formal analysis is provided, outlining the necessary assumptions and the probability of obtaining a correct solution. To assess the impact of matrix size, rank, and the ratio of missing entries on solution quality and computation time, we conducted experiments on synthetic data. The method was also applied to two real-world problems: recommendation systems and image inpainting. Our results show that CSMC provides solutions of the same quality as state-of-the-art matrix completion algorithms based on convex optimization, while achieving significant reductions in computational runtime.

Randomized Approach to Matrix Completion: Applications in Collaborative Filtering and Image Inpainting

TL;DR

The paper tackles matrix completion for tall, incomplete matrices by introducing Columns Selected Matrix Completion (CSMC), a two-stage framework that first completes a reduced column-submatrix and then recovers the full matrix via a least-squares step. Theoretical guarantees show exact recovery with high probability under standard incoherence when the number of sampled columns satisfies and the observed entries satisfy . Two scalable algorithms, CSNN and CSPGD, are proposed to accommodate different problem sizes, with implementations and open-source code provided. Empirical results on synthetic data, movie-rating datasets, and image inpainting demonstrate that CSMC achieves reconstruction quality comparable to state-of-the-art convex MC methods while significantly reducing computational runtime, highlighting its practical value for large-scale, imbalanced matrix problems.

Abstract

We present a novel method for matrix completion, specifically designed for matrices where one dimension significantly exceeds the other. Our Columns Selected Matrix Completion (CSMC) method combines Column Subset Selection and Low-Rank Matrix Completion to efficiently reconstruct incomplete datasets. In each step, CSMC solves a convex optimization problem. We introduce two algorithms to implement CSMC, each tailored to problems of different sizes. A formal analysis is provided, outlining the necessary assumptions and the probability of obtaining a correct solution. To assess the impact of matrix size, rank, and the ratio of missing entries on solution quality and computation time, we conducted experiments on synthetic data. The method was also applied to two real-world problems: recommendation systems and image inpainting. Our results show that CSMC provides solutions of the same quality as state-of-the-art matrix completion algorithms based on convex optimization, while achieving significant reductions in computational runtime.
Paper Structure (26 sections, 6 theorems, 60 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 6 theorems, 60 equations, 9 figures, 4 tables, 1 algorithm.

Key Result

Theorem 3.1

Suppose that $\mathbf{M} \in \mathbb{R}^{n_1 \times n_2}$ has rank $r$ and coherence bounded by $\mu_0(\mathbf{M})$. Suppose that $I \ \subseteq \{1, \ldots n_2 \}$ is chosen by sampling uniformly without replacement to yield $\mathbf{C} = \mathbf{M}_{:I}$. Let $d$ be the number of sampled columns s with a probability at least $1 - \frac{1}{n_2}$, where $\kappa(\mathbf{M})$ denotes the spectral co

Figures (9)

  • Figure 1: The CSMC method overview.
  • Figure 2: Coherence of the one-dimensional subspace $U \subset \mathbb{R}^2$: Fig. \ref{['fig:incoherent']} depicts $U$, spanned by $(\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}})$, with the smallest coherence $\mu(U)=1$. Fig. \ref{['fig:medium_coherent']} illustrates the case where $1 < \mu(U) < 2$, while Fig. \ref{['fig:coherent']} presents $U$, spanned by one of the standard basis vectors $\mathbf{e}_2$, which achieves maximal coherence $\mu(U)=2$.
  • Figure 3: Results S I: ECDF (\ref{['eq:ecdf']}) and runtimes depending on the matrix rank and rate of the known entries for NN and CSNN-$\alpha$, $\alpha \in \{0.1,\cdots, 0.9\}$, $\mathbf{M} \in \mathbb{R}^{300 \times 1000}$. Sampling with CSNN-$0.2$ resulted in a tenfold (10x) improvement in time efficiency over the NN algorithm while preserving solution quality.
  • Figure 4: Results S II: ECDF (\ref{['eq:ecdf']}) curves depending on the matrix rank and rate of the known entries for NN and CSNN-$\alpha$, $\alpha \in \{0.1,\cdots, 0.9\}$, $\mathbf{M} \in \mathbb{R}^{300 \times 1000}$. The relative error magnitude for CSNN-$0.3$ and CSNN-$0.5$ was notably lower in rank-5 and rank-10 scenarios, respectively, compared to MF.
  • Figure 5: Results S II: ECDF (\ref{['eq:ecdf']}) curves depending on the matrix rank and rate of the known entries for NN and CSNN-$\alpha$, $\alpha \in \{0.1,\cdots, 0.9\}$, $\mathbf{M} \in \mathbb{R}^{300 \times 1000}$.
  • ...and 4 more figures

Theorems & Definitions (14)

  • Definition 3.1
  • Theorem 3.1: Corollary 3.6 in Cai et al. Cai21r
  • Theorem 3.2
  • Theorem A.1: Theorem 9 in Xu et al. Xu15
  • Remark A.1
  • Definition A.1: Boyd and Vandenberghe boyd2004convex
  • Remark A.2: Boyd and Vandenberghe boyd2004convex
  • Remark A.3: Boyd and Vandenberghe boyd2004convex
  • Theorem A.2
  • proof
  • ...and 4 more