Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors

Minrui Luo; Zhiheng Zhang

Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors

Minrui Luo, Zhiheng Zhang

Abstract

Synthetic Nearest Neighbors (SNN) provides a principled solution to causal matrix completion under missing-not-at-random (MNAR) by exploiting local low-rank structure through fully observed anchor submatrices. However, its effectiveness critically relies on sufficient data availability within each treatment level, a condition that often fails in settings with multiple or complex treatments. In this work, we propose Mixed Synthetic Nearest Neighbors (MSNN), a new entry-wise causal identification estimator that integrates information across treatment levels. We show that MSNN retains the finite-sample error bounds and asymptotic normality guarantees of SNN, while enlarging the effective sample size available for estimation. Empirical results on synthetic and real-world datasets illustrate the efficacy of the proposed approach, especially under data-scarce treatment levels.

Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors

Abstract

Paper Structure (23 sections, 10 theorems, 27 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 23 sections, 10 theorems, 27 equations, 2 figures, 1 table, 3 algorithms.

Introduction
Preliminaries
Identification
From SNN to MSNN: Data Integration across Treatment levels
SNN under multiple treatment
Mixed Synthetic Nearest Neighbors: data combination
Finding Mixed Anchor Rows and Columns
Theoretical results
Additional assumptions
Preservation of Finite-sample bound and Asymptotic Normality
Sample efficiency for large matrix
Case study: missing completely at random
Experiments
Simulation Study
Case study: California Proposition 99
...and 8 more sections

Key Result

Lemma 2.6

Under Assumption assumption: linear span inclusion on latent row factors and assumption: same latent row factors, the index set $\mathcal{I}^{(d)}(i)$ and coefficient $\beta^{(d)}\left(\mathcal{I}^{(d)}(i)\right)$ are irrelevant to treatment $d$, i.e. $\forall d^\prime \in \mathcal{L}$, $u_i^{(d^\pr

Figures (2)

Figure 1: Comparison between SNN and MSNN. The leftmost subfigure illustrates the SNN algorithm with $K_{\rm SNN} = 2$: it requires $\boldsymbol{S}^{(k)}$, $q^{(k)}$ and $x^{(k)}$ are all fully observed at treatment level as the same of the estimated treatment, which is rare under data-scarce levels (e.g. the "red" level in the second subfigure). The rest four subfigures explain the procedure of MSNN for a specific subgroup $k$: given entry $(i,j)$ and estimated treatment level (here is "red"), one need to find a fully observed $x^{(k)}$ under same "red" level, but the $\boldsymbol{S}^{(k)}$ and $q^{(k)}$ can be integrated from other treatments (here: "blue" and "green" level). The only requirement is that for each column of $\boldsymbol{S}^{(k)}$ (namely, $s_i^{(k)}$), its treatments should be as the same as the treatment of corresponding $q_i^{(k)}$, see the third and fourth subfigures.
Figure 2: Selected prediction results of MSNN on Proposition 99 study in abadie2010synthetic. The three states Kansas, Arizona, New Jersey belong to treatment group of control, program, taxes respectively. The dashed lines are estimation results, while the solid lines indicates real-world observation. Before the year of 1989 (illustrated by the vertical dotted gray lines) all states are in control group so the solid lines are black-colored, after which their color varies. The dotted line indicates the time of Proposition 99 assignment. The dashed lines and solid lines of same color at the same time periods are close to each other, indicating successful validation and thus the correctness of applying our model on this real-world dataset.

Theorems & Definitions (24)

Remark 2.3
Lemma 2.6
Theorem 2.7
Remark 3.1
Theorem 4.5
Theorem 4.6
Remark 4.7
Theorem 4.8
Remark 4.9
Corollary 4.10
...and 14 more

Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors

Abstract

Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (24)