Table of Contents
Fetching ...

Exploiting Observation Bias to Improve Matrix Completion

Yassir Jedra, Sean Mann, Charlotte Park, Devavrat Shah

TL;DR

This work addresses matrix completion under MNAR by assuming the observation pattern and outcomes are driven by shared latent factors. It introduces Mask Nearest Neighbor (MNN), a two-stage method that first estimates latent-factor distances from the observed mask and then uses recovered latent features for non-parametric supervised learning to predict the full outcome matrix. The authors prove entrywise finite-sample error guarantees for MNN and show rates of the form $\tilde{O}(n^{-(2-\beta)/(2d)})$ under mild conditions, illustrating that bias can be exploited to achieve competitive performance with supervised learning. Empirically, MNN achieves up to 28x improvements in MSE on real-world MNAR data and demonstrates favorable comparisons to SNN in synthetic experiments, highlighting practical impact for recommender systems and policy evaluation tasks.

Abstract

We consider a variant of matrix completion where entries are revealed in a biased manner. We wish to understand the extent to which such bias can be exploited in improving predictions. Towards that, we propose a natural model where the observation pattern and outcome of interest are driven by the same set of underlying latent (or unobserved) factors. We devise Mask Nearest Neighbor (MNN), a novel two-stage matrix completion algorithm: first, it recovers (distances between) the latent factors by utilizing matrix estimation for the fully observed noisy binary matrix, corresponding to the observation pattern; second, it utilizes the recovered latent factors as features and sparsely observed noisy outcomes as labels to perform non-parametric supervised learning. Our analysis reveals that MNN enjoys entry-wise finite-sample error rates that are competitive with corresponding supervised learning parametric rates. Despite not having access to the latent factors and dealing with biased observations, MNN exhibits such competitive performance via only exploiting the shared information between the bias and outcomes. Finally, through empirical evaluation using a real-world dataset, we find that with MNN, the estimates have 28x smaller mean squared error compared to traditional matrix completion methods, suggesting the utility of the model and method proposed in this work.

Exploiting Observation Bias to Improve Matrix Completion

TL;DR

This work addresses matrix completion under MNAR by assuming the observation pattern and outcomes are driven by shared latent factors. It introduces Mask Nearest Neighbor (MNN), a two-stage method that first estimates latent-factor distances from the observed mask and then uses recovered latent features for non-parametric supervised learning to predict the full outcome matrix. The authors prove entrywise finite-sample error guarantees for MNN and show rates of the form under mild conditions, illustrating that bias can be exploited to achieve competitive performance with supervised learning. Empirically, MNN achieves up to 28x improvements in MSE on real-world MNAR data and demonstrates favorable comparisons to SNN in synthetic experiments, highlighting practical impact for recommender systems and policy evaluation tasks.

Abstract

We consider a variant of matrix completion where entries are revealed in a biased manner. We wish to understand the extent to which such bias can be exploited in improving predictions. Towards that, we propose a natural model where the observation pattern and outcome of interest are driven by the same set of underlying latent (or unobserved) factors. We devise Mask Nearest Neighbor (MNN), a novel two-stage matrix completion algorithm: first, it recovers (distances between) the latent factors by utilizing matrix estimation for the fully observed noisy binary matrix, corresponding to the observation pattern; second, it utilizes the recovered latent factors as features and sparsely observed noisy outcomes as labels to perform non-parametric supervised learning. Our analysis reveals that MNN enjoys entry-wise finite-sample error rates that are competitive with corresponding supervised learning parametric rates. Despite not having access to the latent factors and dealing with biased observations, MNN exhibits such competitive performance via only exploiting the shared information between the bias and outcomes. Finally, through empirical evaluation using a real-world dataset, we find that with MNN, the estimates have 28x smaller mean squared error compared to traditional matrix completion methods, suggesting the utility of the model and method proposed in this work.
Paper Structure (70 sections, 25 theorems, 145 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 70 sections, 25 theorems, 145 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Let $g: \mathcal{S}^{d-1}\times \mathcal{S}^{d-1} \mapsto \mathbb{R}$ be bounded and $L$-Lipschitz with respect to the $1$-product metric in $\mathcal{S}^{d-1} \times \mathcal{S}^{d-1}$. Then, there exists a non-increasing sequence $(\sigma_k)_{k \ge 1}$ with $\sum_{k=1}^\infty \sigma_k^2 < \infty$,

Figures (6)

  • Figure 1: Visual depiction of the first stage of MNN. The matrix of outcomes $\mathbf{Y}$ is illustrated by figure (a). From the outcomes matrix $\mathbf{Y}$, we extract the matrix $\mathbf{A}$ which is fully observed and illustrated by figure (b). The matrix $\mathbf{A}$ is then used to learn distances between the shared latent user factors (resp. latent item factors), then use these distances to cluster users (resp. items). Using these clusters, we may rearrange the rows and columns of the outcome matrix as illustrated by figure (c), where each block correspond to the Cartesian product of a cluster of users and a cluster of items.
  • Figure 2: Visual depiction of the second stage of MNN. The blue cells correspond to observed or imputed entries, and the red cells correspond to unobserved entries. The matrix illustrated in (a) corresponds to the outcomes matrix after rearranging its rows and columns. The matrix in (b) corresponds to the resulting matrix after the denoising step using \ref{['eq:def-bar-H']}. The matrix illustrated in (c) corresponds to the filled entries after using our the imputation procedure using \ref{['eq:impute']}.
  • Figure 3: Distributions of test set estimates from $\texttt{MNN}$ and modified $\texttt{USVT}$ for real-world data. The dashed line represents the mean of the true distribution. As can be seen $\texttt{MNN}$ is reasonably accurate while $\texttt{USVT}$ is large bias.
  • Figure 4: Predictions vs true outcomes for real-world data using $\texttt{MNN}$ (left) and modified $\texttt{USVT}$ (right). The dashed line represents perfect prediction. As can be seen $\texttt{MNN}$ is reasonably accurate while $\texttt{USVT}$ is far from the ground truth.
  • Figure 5: Distributions for synthetic data generated with $n = 100, 562$, and $3162$ users and items (top to bottom). The dashed line represents the mean of the true distribution.
  • ...and 1 more figures

Theorems & Definitions (26)

  • Proposition 1
  • Proposition 2
  • Definition 1: Non-degeneracy
  • Proposition 3
  • Proposition 4
  • Theorem 1
  • Lemma 1
  • Lemma 2: Distance estimation
  • Lemma 3: Coverage
  • Lemma 4: Observations
  • ...and 16 more