Table of Contents
Fetching ...

Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors

Minrui Luo, Zhiheng Zhang

Abstract

Synthetic Nearest Neighbors (SNN) provides a principled solution to causal matrix completion under missing-not-at-random (MNAR) by exploiting local low-rank structure through fully observed anchor submatrices. However, its effectiveness critically relies on sufficient data availability within each treatment level, a condition that often fails in settings with multiple or complex treatments. In this work, we propose Mixed Synthetic Nearest Neighbors (MSNN), a new entry-wise causal identification estimator that integrates information across treatment levels. We show that MSNN retains the finite-sample error bounds and asymptotic normality guarantees of SNN, while enlarging the effective sample size available for estimation. Empirical results on synthetic and real-world datasets illustrate the efficacy of the proposed approach, especially under data-scarce treatment levels.

Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors

Abstract

Synthetic Nearest Neighbors (SNN) provides a principled solution to causal matrix completion under missing-not-at-random (MNAR) by exploiting local low-rank structure through fully observed anchor submatrices. However, its effectiveness critically relies on sufficient data availability within each treatment level, a condition that often fails in settings with multiple or complex treatments. In this work, we propose Mixed Synthetic Nearest Neighbors (MSNN), a new entry-wise causal identification estimator that integrates information across treatment levels. We show that MSNN retains the finite-sample error bounds and asymptotic normality guarantees of SNN, while enlarging the effective sample size available for estimation. Empirical results on synthetic and real-world datasets illustrate the efficacy of the proposed approach, especially under data-scarce treatment levels.
Paper Structure (23 sections, 10 theorems, 27 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 23 sections, 10 theorems, 27 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Lemma 2.6

Under Assumption assumption: linear span inclusion on latent row factors and assumption: same latent row factors, the index set $\mathcal{I}^{(d)}(i)$ and coefficient $\beta^{(d)}\left(\mathcal{I}^{(d)}(i)\right)$ are irrelevant to treatment $d$, i.e. $\forall d^\prime \in \mathcal{L}$, $u_i^{(d^\pr

Figures (2)

  • Figure 1: Comparison between SNN and MSNN. The leftmost subfigure illustrates the SNN algorithm with $K_{\rm SNN} = 2$: it requires $\boldsymbol{S}^{(k)}$, $q^{(k)}$ and $x^{(k)}$ are all fully observed at treatment level as the same of the estimated treatment, which is rare under data-scarce levels (e.g. the "red" level in the second subfigure). The rest four subfigures explain the procedure of MSNN for a specific subgroup $k$: given entry $(i,j)$ and estimated treatment level (here is "red"), one need to find a fully observed $x^{(k)}$ under same "red" level, but the $\boldsymbol{S}^{(k)}$ and $q^{(k)}$ can be integrated from other treatments (here: "blue" and "green" level). The only requirement is that for each column of $\boldsymbol{S}^{(k)}$ (namely, $s_i^{(k)}$), its treatments should be as the same as the treatment of corresponding $q_i^{(k)}$, see the third and fourth subfigures.
  • Figure 2: Selected prediction results of MSNN on Proposition 99 study in abadie2010synthetic. The three states Kansas, Arizona, New Jersey belong to treatment group of control, program, taxes respectively. The dashed lines are estimation results, while the solid lines indicates real-world observation. Before the year of 1989 (illustrated by the vertical dotted gray lines) all states are in control group so the solid lines are black-colored, after which their color varies. The dotted line indicates the time of Proposition 99 assignment. The dashed lines and solid lines of same color at the same time periods are close to each other, indicating successful validation and thus the correctness of applying our model on this real-world dataset.

Theorems & Definitions (24)

  • Remark 2.3
  • Lemma 2.6
  • Theorem 2.7
  • Remark 3.1
  • Theorem 4.5
  • Theorem 4.6
  • Remark 4.7
  • Theorem 4.8
  • Remark 4.9
  • Corollary 4.10
  • ...and 14 more