Table of Contents
Fetching ...

Doubly Robust Inference in Causal Latent Factor Models

Alberto Abadie, Anish Agarwal, Raaz Dwivedi, Abhin Shah

Abstract

This article introduces a new estimator of average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show that the error of the new estimator converges to a mean-zero Gaussian distribution at a parametric rate. Simulation results demonstrate the relevance of the formal properties of the estimators analyzed in this article.

Doubly Robust Inference in Causal Latent Factor Models

Abstract

This article introduces a new estimator of average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show that the error of the new estimator converges to a mean-zero Gaussian distribution at a parametric rate. Simulation results demonstrate the relevance of the formal properties of the estimators analyzed in this article.
Paper Structure (55 sections, 23 theorems, 179 equations, 7 figures)

This paper contains 55 sections, 23 theorems, 179 equations, 7 figures.

Key Result

Theorem 1

Suppose assumption_posassumption_noiseassumption_estimatesassumption_pos_estimated hold. Fix $\delta \in (0,1)$ and $j \in [M]$. Then, with probability at least $1-\delta$, we have where for $m(c)$ and $\ell_c$ as defined in section_introduction.

Figures (7)

  • Figure 1: Schematic of the treatment assignment matrix ${A}$, the observed outcomes matrix ${Y}$ (where green and blue fills indicate observations under $a = 1$ and $a = 0$, respectively), and the observed component of the potential outcomes matrices, i.e., ${{Y}}^{(0),\mathrm{obs}}_{}$ and ${{Y}}^{(1),\mathrm{obs}}_{}$ (where $\,?$ indicates a missing value). All matrices are $N \times M$ where $N$ is the number of customers and $M$ is the number of products.
  • Figure 2: Simulation evidence of the convergence of the error of the doubly-robust (DR) estimator to a mean-zero Gaussian distribution. The histogram represents $\widehat{\mathrm{ATE}}{}_{\cdot,j}^{\,\mathrm{DR}}- \mathrm{ATE}_{\cdot,j}$, the green curve represents the (best) fitted Gaussian distribution, and the black curve represents the Gaussian approximation from \ref{['thm_normality']} in \ref{['sec_main_results']}. Histogram counts are normalized so that the area under the histogram integrates to one. Unlike DR, the outcome imputation (OI) and inverse probability weighting (IPW) estimators have non-trivial biases, as evidenced by the means of the distributions in dashed green, blue, and red, respectively. \ref{['sec_simulations']} reports complete simulation results.
  • Figure 3: Panel $(a)$: A matrix $S$ partitioned into four blocks when $\mathcal{R}_0 = [N/2]$ and $\mathcal{C}_0 = [M/2]$ in \ref{['assumption_block_noise']}, i.e., $\mathcal{P} = \{ \text{Top Left, Top Right, Bottom Left, Bottom Right} \}$. Panel $(b)$: The matrix $S \otimes \boldsymbol{1}^{-\text{Bottom Right}}$ obtained from the matrix $S$ by masking the entries corresponding to the $\text{Bottom Right}$ block with $\,?$.
  • Figure 4: Panels $(a)$, $(b)$, and $(c)$ illustrate the matrices ${A} \otimes \boldsymbol{1}^{-\mathcal{I}}$, ${{Y}}^{(0),\mathrm{obs}}_{}\otimes \boldsymbol{1}^{-\mathcal{I}}$, and ${{Y}}^{(1),\mathrm{obs}}_{}\otimes \boldsymbol{1}^{-\mathcal{I}}$ obtained from ${A}$, ${{Y}}^{(0),\mathrm{obs}}_{}$ and ${{Y}}^{(1),\mathrm{obs}}_{}$, respectively, for the block partition $\mathcal{P}$ in \ref{['figure_sample_split']}$(a)$ and the block $\mathcal{I} = \text{Bottom Right}$. Unlike Panels $(b)$ and $(c)$, there exists rows and columns with all entries observed in Panel $(a)$. To enable the application of $\texttt{TW}$ for Panels $(b)$ and $(c)$, we replace missing entries in blocks $\text{Top Left}$, $\text{Top Right}$, and $\text{Bottom Left}$ with zeros.
  • Figure 5: Empirical illustration of the asymptotic performance of DR as in \ref{['thm_normality']}. The histogram corresponds to the errors of 2500 independent instances of DR estimates, the green curve represents the (best) fitted Gaussian distribution, and the black curve represents the Gaussian approximation from \ref{['thm_normality']}. The dashed green, blue, and red lines represent the biases of DR, OI, and IPW estimators.
  • ...and 2 more figures

Theorems & Definitions (31)

  • Theorem 1: Finite Sample Guarantees for DR
  • Proposition 1: Finite Sample Guarantees for OI and IPW
  • Corollary 1: Gains of DR over OI and IPW
  • Corollary 2: Consistency for DR
  • Theorem 2: Asymptotic Normality for DR
  • Proposition 2: Consistent variance estimation
  • Proposition 3: Guarantees for $\texttt{Cross}\texttt{-}\texttt{Fitted}\texttt{-}\texttt{MC}$
  • Proposition 4: Guarantees for $\texttt{Cross-}\texttt{Fitted-}\texttt{SVD}$
  • Lemma 1: $\textrm{subGaussian}$ concentration: Theorem 2.6.3 of vershynin2018high
  • Corollary 3: $\textrm{subGaussian}$ concentration
  • ...and 21 more