Exact Error in Matrix Completion: Approximately Low-Rank Structures and Missing Blocks
Agostino Capponi, Mihailo Stojnic
TL;DR
This work addresses exact performance of nuclear-norm based matrix completion under MNAR block missingness in approximately low-rank settings. It develops a no-distribution, Lagrangian-duality framework grounded in free probability and random matrix theory to obtain exact worst-case RMSE and a precise phase-transition curve, $\beta_{wc}(\eta)=\tfrac{1}{2}-\sqrt{\eta-\eta^2}$, that separates perfect from failed recovery. The main contributions are explicit RMSE bounds depending on $(\beta,\eta,\sigma_\epsilon)$, a demonstration that data-heterogeneity can scale linearly with matrix size, and comprehensive numerical validation showing strong agreement even at small dimensions. These results provide rigorous, practical guidance for block MNAR matrix completion in applications such as block causal inference and causal effect imputation, where missingness patterns reflect irreversible treatments or time-block structures.
Abstract
We study the completion of approximately low rank matrices with entries missing not at random (MNAR). In the context of typical large-dimensional statistical settings, we establish a framework for the performance analysis of the nuclear norm minimization ($\ell_1^*$) algorithm. Our framework produces \emph{exact} estimates of the worst-case residual root mean squared error and the associated phase transitions (PT), with both exhibiting remarkably simple characterizations. Our results enable to {\it precisely} quantify the impact of key system parameters, including data heterogeneity, size of the missing block, and deviation from ideal low rankness, on the accuracy of $\ell_1^*$-based matrix completion. To validate our theoretical worst-case RMSE estimates, we conduct numerical simulations, demonstrating close agreement with their numerical counterparts.
