Identifiable causal inference with noisy treatment and no side information

Antti Pöllänen; Pekka Marttinen

Identifiable causal inference with noisy treatment and no side information

Antti Pöllänen, Pekka Marttinen

TL;DR

This paper proposes a model that assumes a continuous treatment variable which is inaccurately measured and proves that the model's causal effect estimates are identifiable, even without knowledge of the measurement error variance or other side information.

Abstract

In some causal inference scenarios, the treatment variable is measured inaccurately, for instance in epidemiology or econometrics. Failure to correct for the effect of this measurement error can lead to biased causal effect estimates. Previous research has not studied methods that address this issue from a causal viewpoint while allowing for complex nonlinear dependencies and without assuming access to side information. For such a scenario, this study proposes a model that assumes a continuous treatment variable that is inaccurately measured. Building on existing results for measurement error models, we prove that our model's causal effect estimates are identifiable, even without side information and knowledge of the measurement error variance. Our method relies on a deep latent variable model in which Gaussian conditionals are parameterized by neural networks, and we develop an amortized importance-weighted variational objective for training the model. Empirical results demonstrate the method's good performance with unknown measurement error. More broadly, our work extends the range of applications in which reliable causal inference can be conducted.

Identifiable causal inference with noisy treatment and no side information

TL;DR

Abstract

Paper Structure (20 sections, 2 theorems, 16 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 2 theorems, 16 equations, 10 figures, 2 tables, 1 algorithm.

Introduction
Methods
Models for causal estimation
Inference
Model training
Identifiability analysis
Identifiability of causal estimation with a noisy treatment
Experiments and results
Synthetic experiment
Synthetic datasets from Gaussian processes
Results
Experiment with education-wage data
Results
Conclusion
Review of literature on the identifiability of related models
...and 5 more sections

Key Result

Proposition 1

The measurement error model defined in Equations eq:x--eq:delta_y is model identifiable if 1) for every $z$, $\mu_{Y}(z, x^*)$ is continuously differentiable everywhere as a function of $x^*$, 2) for every $z$, the set $\chi = \{ x^* : \frac{\partial}{\partial x^*} \mu_Y(z,x^*) = 0 \}$ has at most

Figures (10)

Figure 1: Causal graph for our proposed model. The observed variables $X$ (noisy treatment), $Y$ (effect/outcome), and $Z$ (confounders) are shaded to distinguish them from the hidden variable $X^*$ (true treatment).
Figure 2: Comparison of our method (CEME) against a naive method which does not account for measurement error in the treatment $X^*$. The accurate values of $X^*$ are hidden from both methods. Ground truth is the true mean function of the data generating process. The same data are displayed both with and without measurement error in $X^*$. It can be seen that our method “CEME” fits the error-free data (even if they are not seen by any method) whereas the “naive” method fits the data with error and cannot estimate the true regression function accurately. A similar example is presented by zhu2022causal.
Figure 3: Three realizations of synthetic datasets generated from a Gaussian process with 3000 data points (black dots). The figures show the covariate $Z$ (x-axis), the noiseless versions of the treatment $X^*$ (y-axis), and the outcome ${\mathbb E}[Y|Z,X^*]$ (heatmap).
Figure 4: Error in ${\mathbb E}[y|z,do(x^*)]$ estimation in the synthetic experiment. In addition to data for Naive, missing from the figure are some outliers for CEME and CEME$^+$ for noise level 40% and training dataset size 1000. Key observations are that 1) the proposed CEME/CEME$^+$ methods offer clear benefit over Naive that does not account for measurement error, 2) CEME and CEME$^+$ seem to converge with increasing training set size and compare relatively well with the loose upper bound Oracle, and 3) CEME handles unknown measurement error variance well, since CEME$^+$ that knows it, performs only slightly better.
Figure 5: Error in estimation of $\Delta Y$ standard deviation in the synthetic experiment. In addition to data for Naive, missing from the figure are 4 outliers for CEME, 1000 training data and 10% noise as well as 2 outliers for CEME, 4000 training data and 10% noise. Key observations are that 1) the estimates seem to converge for the proposed CEME/CEME$^+$ but not for the baseline Naive, and 2) all the methods tend to overestimate, rather than underestimate, the standard deviation of $\Delta Y$.
...and 5 more figures

Theorems & Definitions (6)

Definition 1: Identifiability of an estimand $f$
Proposition 1
proof
Definition 2
Theorem 1
proof

Identifiable causal inference with noisy treatment and no side information

TL;DR

Abstract

Identifiable causal inference with noisy treatment and no side information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (6)