Table of Contents
Fetching ...

Estimating Treatment Effects with Independent Component Analysis

Patrik Reizinger, Lester Mackey, Wieland Brendel, Rahul Krishnan

Abstract

Independent Component Analysis (ICA) uses a measure of non-Gaussianity to identify latent sources from data and estimate their mixing coefficients (Shimizu et al., 2006). Meanwhile, higher-order Orthogonal Machine Learning (OML) exploits non-Gaussian treatment noise to provide more accurate estimates of treatment effects in the presence of confounding nuisance effects (Mackey et al., 2018). Remarkably, we find that the two approaches rely on the same moment conditions for consistent estimation. We then seize upon this connection to show how ICA can be effectively used for treatment effect estimation. Specifically, we prove that linear ICA can consistently estimate multiple treatment effects, even in the presence of Gaussian confounders, and identify regimes in which ICA is provably more sample-efficient than OML for treatment effect estimation. Our synthetic demand estimation experiments confirm this theory and demonstrate that linear ICA can accurately estimate treatment effects even in the presence of nonlinear nuisance.

Estimating Treatment Effects with Independent Component Analysis

Abstract

Independent Component Analysis (ICA) uses a measure of non-Gaussianity to identify latent sources from data and estimate their mixing coefficients (Shimizu et al., 2006). Meanwhile, higher-order Orthogonal Machine Learning (OML) exploits non-Gaussian treatment noise to provide more accurate estimates of treatment effects in the presence of confounding nuisance effects (Mackey et al., 2018). Remarkably, we find that the two approaches rely on the same moment conditions for consistent estimation. We then seize upon this connection to show how ICA can be effectively used for treatment effect estimation. Specifically, we prove that linear ICA can consistently estimate multiple treatment effects, even in the presence of Gaussian confounders, and identify regimes in which ICA is provably more sample-efficient than OML for treatment effect estimation. Our synthetic demand estimation experiments confirm this theory and demonstrate that linear ICA can accurately estimate treatment effects even in the presence of nonlinear nuisance.

Paper Structure

This paper contains 69 sections, 9 theorems, 38 equations, 23 figures, 7 tables, 1 algorithm.

Key Result

Proposition 3.1

When assum:lin_plr holds and $\mathrm{Var}(\varepsilon)=1$, linear ICA identifies the causal effect $\theta$ at the global optimum of the loss in the infinite sample limit.

Figures (23)

  • Figure 1: Overview of treatment effect estimation in the plr model.(Left:) The linear model, where the covariates $X$ affect both treatment $T$ and outcome $Y$. The quantity of interest is the treatment effect $\theta$. (Center:)oml estimates $\theta$ in three steps. (Right:)ica can invert the model by maximizing non-Gaussianity of the sources, thereby yielding $\theta$ as a coefficient in the unmixing matrix$\boldsymbol{\mathrm{W}}$. Scale and permutation indeterminacies are resolved by relying on non-Gaussianity and the structure (\ref{['lem:lin_plr_ica']}).
  • Figure 2: Relative efficiency of ICA vs. higher-order OML for demand estimation (see \ref{['subsec:exp_homl']}). Left: RMSE difference (ICA $-$ OML) as a function of the ICA asymptotic variance coefficient $c_{\text{ICA}} = 1 + (b + a\theta)^2$ derived in \ref{['efficiency']}. Blue points indicate ICA outperforms OML; red points indicate OML outperforms ICA. Right: Performance stratified by $c_{\text{ICA}}$ regime. ICA wins overall (72.9% win rate), dominating especially when $c_{\text{ICA}} < 1.5$ (96.3% win rate). OML is preferable in the medium regime ($1.5 \leq c_{\text{ICA}} < 5$), with a 64.3% win rate.
  • Figure 3: RMSE difference (ICA $-$ higher-order OML) as $n$ and covariate distribution $\beta$ vary for $c_{\text{ICA}} < 1.5$.
  • Figure 4: Left: Relative of treatment effect estimation for Laplace noises in nonlinear across multiple covariate dimensions for linear ICA with different nonlinearities with $5,000$ samples. Leaky ReLU uses a slope of $0.2$. See \ref{['fig:heatmap_dimension_vs_slope_leaky_relu']} for an ablation over slopes. Right: Relative of ICA treatment effect estimation across covariate dimensions $d$ and sample sizes $n$ for $m = 2$ treatments in linear . Means calculated from $20$ seeds. See \ref{['fig:ica_multi']} for an ablation over treatment counts.
  • Figure E.1: Source identification via mcc for ICA in linear . Means from $20$ seeds ($d=10$). hyvarinen_unsupervised_2016 measures source recovery (0--1; higher is better). Gaussian covariates ($\beta=2$) yield lowest MCC, as predicted by theory.
  • ...and 18 more figures

Theorems & Definitions (19)

  • Definition 3.1: Linear PLR
  • Proposition 3.1: Treatment effect estimation with ICA
  • Theorem 3.1: Asymptotic relative efficiency
  • Definition 3.2: Multiple treatment linear PLR
  • Proposition 3.2: Estimating multiple treatment effects with ICA
  • Proposition 3.3: Treatment effect estimation with Gaussian covariates and ICA
  • Definition 3.3: Nonlinear PLR
  • Lemma B.1: Higher-order OML moment condition for whitened data and $r=3$
  • proof
  • Lemma B.2: ICA moment condition for whitened data and kurtosis loss
  • ...and 9 more