Table of Contents
Fetching ...

Matrix Completion with Cross-Concentrated Sampling: Bridging Uniform Sampling and CUR Sampling

HanQin Cai, Longxiu Huang, Pengyu Li, Deanna Needell

TL;DR

This work introduces Cross-Concentrated Sampling (CCS), a flexible matrix-completion framework that blends uniform sampling and CUR sampling by concentrating observations on a cross formed by selected rows and columns. It establishes a sufficient condition for exact recovery under CCS with complexity $ ext{O}(r^2 n \, ext{log}^2(n))$ samples and develops a scalable non-convex solver, Iterative CUR Completion (ICURC), whose per-iteration cost is $ ext{O}(n r (|I|+|J|))$. Theoretical results are complemented by extensive experiments on synthetic data, image inpainting, collaborative filtering, and link prediction, showing CCS's practical benefits and ICURC's efficiency. The findings suggest CCS can reduce sampling costs and adapt to real-world constraints while maintaining recovery guarantees, with potential extensions to tensors and more rigorous convergence analysis in future work.

Abstract

While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.

Matrix Completion with Cross-Concentrated Sampling: Bridging Uniform Sampling and CUR Sampling

TL;DR

This work introduces Cross-Concentrated Sampling (CCS), a flexible matrix-completion framework that blends uniform sampling and CUR sampling by concentrating observations on a cross formed by selected rows and columns. It establishes a sufficient condition for exact recovery under CCS with complexity samples and develops a scalable non-convex solver, Iterative CUR Completion (ICURC), whose per-iteration cost is . Theoretical results are complemented by extensive experiments on synthetic data, image inpainting, collaborative filtering, and link prediction, showing CCS's practical benefits and ICURC's efficiency. The findings suggest CCS can reduce sampling costs and adapt to real-world constraints while maintaining recovery guarantees, with potential extensions to tensors and more rigorous convergence analysis in future work.

Abstract

While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.
Paper Structure (21 sections, 4 theorems, 28 equations, 11 figures, 4 tables, 3 algorithms)

This paper contains 21 sections, 4 theorems, 28 equations, 11 figures, 4 tables, 3 algorithms.

Key Result

Theorem 1

For given row and column submatrices ${\bm{R}}$ and ${\bm{C}}$, the CUR decomposition eq:CUR holds if and only if $\mathrm{rank}({\bm{U}})=\mathrm{rank}(\bm{X})=r$.

Figures (11)

  • Figure 1: Visual illustrations of different sampling schemes. From left to right, sampling methods change from the uniform sampling style to the CUR sampling style with the same total observations rate. Colored pixels indicate observed entries, while black pixels mean missing entries.
  • Figure 2: Visual results for image inpainting from the CCS-based samples via ScalePGD algorithm tong2021accelerating. See a more detailed setting in Section \ref{['subsec:three_image']}.
  • Figure 3: Empirical phase transition in the overall sampling rate $\alpha$, the percentage of selected rows and columns $\delta$ , and uniform sampling rates on the selected submatrices $p$. Row 1: 3D-view of the empirical phase transition of ICURC. Row 2: 2D view of empirical phase transition of ICURC (in the red box), ScaledPGD (in the blue box), and SVP (in the green box). Left: $r=5$. Middle: $r=10$. Right: $r=15$. One can see that as rank increases, the required overall sampling rate increases correspondingly. Additionally, the CCS model provides flexibility in obtaining a sufficient amount of data to ensure completing the missing data successfully and the performance of the ICURC algorithm from the CCS-based samples is comparable to that of the state-of-the-art algorithms (SVP and ScaledPGD) from the uniform-sampling-based samples.
  • Figure 4: Empirical phase transitions of ICURC in overall observation size $s$ and problem size $n$. The column (resp. row) number of the concentrated column (resp. row) submatrix equals to $cr\log^2(n)$. Row 1: $r = 5$. Row 2: $r = 10$. Row 3: $r = 15$. Left: $c = 0.25$. Middle: $c = 0.5$. Right: $c = 1$. The required samples for guaranteed matrix completion are independent of the size of the concentrated submatrices.
  • Figure 5: Visual results for image inpainting by setting rank $r = 20$ and the percentage of selected rows and columns $\delta = 10 \%$. ScaledPGD and SVP are based on the uniform sampling model with the same observed number of entries as the one based on CCS. All algorithms achieve visually reliable results.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Theorem 1: HammHuang
  • Theorem 2: chiu2013sublinearcai2020rapidcai2021robust
  • Remark 1
  • Example 1
  • Lemma 3
  • Theorem 4
  • Remark 2
  • Remark 3
  • proof : Proof of Lemma \ref{['COR:UniformIncoherence']}
  • proof : Proof of Theorem \ref{['thm:sufficient_condition']}