Table of Contents
Fetching ...

Deconfounding Scores and Representation Learning for Causal Effect Estimation with Weak Overlap

Oscar Clivio, Alexander D'Amour, Alexander Franks, David Bruns-Smith, Chris Holmes, Avi Feller

Abstract

Overlap, also known as positivity, is a key condition for causal treatment effect estimation. Many popular estimators suffer from high variance and become brittle when features differ strongly across treatment groups. This is especially challenging in high dimensions: the curse of dimensionality can make overlap implausible. To address this, we propose a class of feature representations called deconfounding scores, which preserve both identification and the target of estimation; the classical propensity and prognostic scores are two special cases. We characterize the problem of finding a representation with better overlap as minimizing an overlap divergence under a deconfounding score constraint. We then derive closed-form expressions for a class of deconfounding scores under a broad family of generalized linear models with Gaussian features and show that prognostic scores are overlap-optimal within this class. We conduct extensive experiments to assess this behavior empirically.

Deconfounding Scores and Representation Learning for Causal Effect Estimation with Weak Overlap

Abstract

Overlap, also known as positivity, is a key condition for causal treatment effect estimation. Many popular estimators suffer from high variance and become brittle when features differ strongly across treatment groups. This is especially challenging in high dimensions: the curse of dimensionality can make overlap implausible. To address this, we propose a class of feature representations called deconfounding scores, which preserve both identification and the target of estimation; the classical propensity and prognostic scores are two special cases. We characterize the problem of finding a representation with better overlap as minimizing an overlap divergence under a deconfounding score constraint. We then derive closed-form expressions for a class of deconfounding scores under a broad family of generalized linear models with Gaussian features and show that prognostic scores are overlap-optimal within this class. We conduct extensive experiments to assess this behavior empirically.

Paper Structure

This paper contains 62 sections, 5 theorems, 93 equations, 2 figures, 2 tables.

Key Result

Lemma 3.1

For any $\phi$, the confounding bias equals $\blacktriangleleft$$\blacktriangleleft$

Figures (2)

  • Figure 1: The projection of $\gamma$ onto the space spanned by $\alpha$ and $\beta$ lies on a segment of a hyperbola (bold black line) whose endpoints correspond to $\gamma=\alpha$ and $\gamma=\beta$ (when $\alpha'\beta \geq 0$) or to $\gamma=\alpha$ and $\gamma=-\beta$ (when $\alpha'\beta < 0$), shown here, and the opposite segment, not shown here. The orientation of the hyperbola and the endpoints depends on $\alpha'\beta$.
  • Figure 2: RMSE, bias and standard deviation for simulated datasets. Each metric is plotted according to the deconfounding score coordinate parameter $w$; methods using base covariates are constant. In all plots, $s_T = 4$ (low overlap) and $s_Y = 5$ (high SNR). See Section \ref{['sec:estimators_and_inference']} for definitions of names of estimators.

Theorems & Definitions (5)

  • Lemma 3.1
  • Lemma 3.2
  • Lemma 4.1
  • Theorem 4.4
  • Theorem 4.5