Doubly Robust Inference in Causal Latent Factor Models

Alberto Abadie; Anish Agarwal; Raaz Dwivedi; Abhin Shah

Doubly Robust Inference in Causal Latent Factor Models

Alberto Abadie, Anish Agarwal, Raaz Dwivedi, Abhin Shah

Abstract

This article introduces a new estimator of average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show that the error of the new estimator converges to a mean-zero Gaussian distribution at a parametric rate. Simulation results demonstrate the relevance of the formal properties of the estimators analyzed in this article.

Doubly Robust Inference in Causal Latent Factor Models

Abstract

Paper Structure (55 sections, 23 theorems, 179 equations, 7 figures)

This paper contains 55 sections, 23 theorems, 179 equations, 7 figures.

Introduction
Setup
Sources of stochastic variation
Target causal estimand
Estimation
Matrix completion: A primer
Key building blocks
Doubly-robust (DR) estimator
Main Results
Assumptions
Non-asymptotic guarantees
Asymptotic guarantees
Application to panel data with lagged treatment effects
Matrix Completion with Cross-Fitting
: A meta-cross-fitting algorithm for matrix completion
...and 40 more sections

Key Result

Theorem 1

Suppose assumption_posassumption_noiseassumption_estimatesassumption_pos_estimated hold. Fix $\delta \in (0,1)$ and $j \in [M]$. Then, with probability at least $1-\delta$, we have where for $m(c)$ and $\ell_c$ as defined in section_introduction.

Figures (7)

Figure 1: Schematic of the treatment assignment matrix ${A}$, the observed outcomes matrix ${Y}$ (where green and blue fills indicate observations under $a = 1$ and $a = 0$, respectively), and the observed component of the potential outcomes matrices, i.e., ${{Y}}^{(0),\mathrm{obs}}_{}$ and ${{Y}}^{(1),\mathrm{obs}}_{}$ (where $\,?$ indicates a missing value). All matrices are $N \times M$ where $N$ is the number of customers and $M$ is the number of products.
Figure 2: Simulation evidence of the convergence of the error of the doubly-robust (DR) estimator to a mean-zero Gaussian distribution. The histogram represents $\widehat{\mathrm{ATE}}{}_{\cdot,j}^{\,\mathrm{DR}}- \mathrm{ATE}_{\cdot,j}$, the green curve represents the (best) fitted Gaussian distribution, and the black curve represents the Gaussian approximation from \ref{['thm_normality']} in \ref{['sec_main_results']}. Histogram counts are normalized so that the area under the histogram integrates to one. Unlike DR, the outcome imputation (OI) and inverse probability weighting (IPW) estimators have non-trivial biases, as evidenced by the means of the distributions in dashed green, blue, and red, respectively. \ref{['sec_simulations']} reports complete simulation results.
Figure 3: Panel $(a)$: A matrix $S$ partitioned into four blocks when $\mathcal{R}_0 = [N/2]$ and $\mathcal{C}_0 = [M/2]$ in \ref{['assumption_block_noise']}, i.e., $\mathcal{P} = \{ \text{Top Left, Top Right, Bottom Left, Bottom Right} \}$. Panel $(b)$: The matrix $S \otimes \boldsymbol{1}^{-\text{Bottom Right}}$ obtained from the matrix $S$ by masking the entries corresponding to the $\text{Bottom Right}$ block with $\,?$.
Figure 4: Panels $(a)$, $(b)$, and $(c)$ illustrate the matrices ${A} \otimes \boldsymbol{1}^{-\mathcal{I}}$, ${{Y}}^{(0),\mathrm{obs}}_{}\otimes \boldsymbol{1}^{-\mathcal{I}}$, and ${{Y}}^{(1),\mathrm{obs}}_{}\otimes \boldsymbol{1}^{-\mathcal{I}}$ obtained from ${A}$, ${{Y}}^{(0),\mathrm{obs}}_{}$ and ${{Y}}^{(1),\mathrm{obs}}_{}$, respectively, for the block partition $\mathcal{P}$ in \ref{['figure_sample_split']}$(a)$ and the block $\mathcal{I} = \text{Bottom Right}$. Unlike Panels $(b)$ and $(c)$, there exists rows and columns with all entries observed in Panel $(a)$. To enable the application of $\texttt{TW}$ for Panels $(b)$ and $(c)$, we replace missing entries in blocks $\text{Top Left}$, $\text{Top Right}$, and $\text{Bottom Left}$ with zeros.
Figure 5: Empirical illustration of the asymptotic performance of DR as in \ref{['thm_normality']}. The histogram corresponds to the errors of 2500 independent instances of DR estimates, the green curve represents the (best) fitted Gaussian distribution, and the black curve represents the Gaussian approximation from \ref{['thm_normality']}. The dashed green, blue, and red lines represent the biases of DR, OI, and IPW estimators.
...and 2 more figures

Theorems & Definitions (31)

Theorem 1: Finite Sample Guarantees for DR
Proposition 1: Finite Sample Guarantees for OI and IPW
Corollary 1: Gains of DR over OI and IPW
Corollary 2: Consistency for DR
Theorem 2: Asymptotic Normality for DR
Proposition 2: Consistent variance estimation
Proposition 3: Guarantees for $\texttt{Cross}\texttt{-}\texttt{Fitted}\texttt{-}\texttt{MC}$
Proposition 4: Guarantees for $\texttt{Cross-}\texttt{Fitted-}\texttt{SVD}$
Lemma 1: $\textrm{subGaussian}$ concentration: Theorem 2.6.3 of vershynin2018high
Corollary 3: $\textrm{subGaussian}$ concentration
...and 21 more

Doubly Robust Inference in Causal Latent Factor Models

Abstract

Doubly Robust Inference in Causal Latent Factor Models

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (31)