Table of Contents
Fetching ...

Exact Error in Matrix Completion: Approximately Low-Rank Structures and Missing Blocks

Agostino Capponi, Mihailo Stojnic

TL;DR

This work addresses exact performance of nuclear-norm based matrix completion under MNAR block missingness in approximately low-rank settings. It develops a no-distribution, Lagrangian-duality framework grounded in free probability and random matrix theory to obtain exact worst-case RMSE and a precise phase-transition curve, $\beta_{wc}(\eta)=\tfrac{1}{2}-\sqrt{\eta-\eta^2}$, that separates perfect from failed recovery. The main contributions are explicit RMSE bounds depending on $(\beta,\eta,\sigma_\epsilon)$, a demonstration that data-heterogeneity can scale linearly with matrix size, and comprehensive numerical validation showing strong agreement even at small dimensions. These results provide rigorous, practical guidance for block MNAR matrix completion in applications such as block causal inference and causal effect imputation, where missingness patterns reflect irreversible treatments or time-block structures.

Abstract

We study the completion of approximately low rank matrices with entries missing not at random (MNAR). In the context of typical large-dimensional statistical settings, we establish a framework for the performance analysis of the nuclear norm minimization ($\ell_1^*$) algorithm. Our framework produces \emph{exact} estimates of the worst-case residual root mean squared error and the associated phase transitions (PT), with both exhibiting remarkably simple characterizations. Our results enable to {\it precisely} quantify the impact of key system parameters, including data heterogeneity, size of the missing block, and deviation from ideal low rankness, on the accuracy of $\ell_1^*$-based matrix completion. To validate our theoretical worst-case RMSE estimates, we conduct numerical simulations, demonstrating close agreement with their numerical counterparts.

Exact Error in Matrix Completion: Approximately Low-Rank Structures and Missing Blocks

TL;DR

This work addresses exact performance of nuclear-norm based matrix completion under MNAR block missingness in approximately low-rank settings. It develops a no-distribution, Lagrangian-duality framework grounded in free probability and random matrix theory to obtain exact worst-case RMSE and a precise phase-transition curve, , that separates perfect from failed recovery. The main contributions are explicit RMSE bounds depending on , a demonstration that data-heterogeneity can scale linearly with matrix size, and comprehensive numerical validation showing strong agreement even at small dimensions. These results provide rigorous, practical guidance for block MNAR matrix completion in applications such as block causal inference and causal effect imputation, where missingness patterns reflect irreversible treatments or time-block structures.

Abstract

We study the completion of approximately low rank matrices with entries missing not at random (MNAR). In the context of typical large-dimensional statistical settings, we establish a framework for the performance analysis of the nuclear norm minimization () algorithm. Our framework produces \emph{exact} estimates of the worst-case residual root mean squared error and the associated phase transitions (PT), with both exhibiting remarkably simple characterizations. Our results enable to {\it precisely} quantify the impact of key system parameters, including data heterogeneity, size of the missing block, and deviation from ideal low rankness, on the accuracy of -based matrix completion. To validate our theoretical worst-case RMSE estimates, we conduct numerical simulations, demonstrating close agreement with their numerical counterparts.
Paper Structure (18 sections, 13 theorems, 159 equations, 7 figures, 3 tables)

This paper contains 18 sections, 13 theorems, 159 equations, 7 figures, 3 tables.

Key Result

Theorem 3.1

(Algebraic characterization of $W$) Consider a $\bar{U}\in{\mathbb R}^{n\times k}$ such that $\bar{U}^T\bar{U}=I_{k\times k}$ and a $\bar{V}\in{\mathbb R}^{n\times k}$ such that $\bar{V}^T\bar{V}=I_{k\times k}$ and an approximately rank $k$ matrix $X_{sol}=X\in{\mathbb R}^{n\times n}$, such that $X_ With $M\in{\mathbb R}^{n\times n}$ as in (eq:cinfanl2a), assume that $Y= M\circ X_{sol}$ and let $\

Figures (7)

  • Figure 1: Matrix $M\triangleq M^{(l_1,l_2)}$ -- block causal inference setup
  • Figure 2: Typical worst case$\ell_1^*$ phase transition (ideal low rank context (block causal inference -- C-inf))
  • Figure 3: Both $G_{\tilde{D}}^+(z)$ and $G_{\tilde{D}}^-(z)$ need to be taken into account
  • Figure 4: Both $G_{\tilde{D}}^+(z)$ and $G_{\tilde{D}}^-(z)$ need to be taken into account
  • Figure 5: $f_{\tilde{D}}(x)$ -- spectral function of $\tilde{D}$; $\beta=0.1$ and $\eta=0.8$
  • ...and 2 more figures

Theorems & Definitions (26)

  • Theorem 3.1
  • proof
  • Proposition 1
  • proof
  • Lemma 3.2
  • proof
  • Theorem 3.3
  • proof
  • Theorem 3.4
  • proof
  • ...and 16 more