Table of Contents
Fetching ...

Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner

Valentyn Melnychuk, Stefan Feuerriegel, Mihaela van der Schaar

TL;DR

The paper tackles quantifying aleatoric uncertainty in the treatment effect by focusing on the covariate-conditioned distribution of the treatment effect (CDTE), which is not point identifiable. It develops a novel AU-learner that performs Neyman-orthogonal, bias-corrected estimation of the Makarov bounds on the CDTE’s CDF/quantiles, with theoretical guarantees of quasi-oracle efficiency. A neural instantiation (AU-CNFs) provides a practical, scalable implementation that jointly estimates nuisance functions and bounds via conditional normalizing flows. Experiments on synthetic and semi-synthetic benchmarks demonstrate improved accuracy in estimating CDTE bounds, and a real-case lockdown analysis illustrates potential for informing personalized medical and policy decisions.

Abstract

Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the covariate-conditional level, namely, the conditional distribution of the treatment effect (CDTE). Unlike average causal quantities, the CDTE is not point identifiable without strong additional assumptions. As a remedy, we employ partial identification to obtain sharp bounds on the CDTE and thereby quantify the aleatoric uncertainty of the treatment effect. We then develop a novel, orthogonal learner for the bounds on the CDTE, which we call AU-learner. We further show that our AU-learner has several strengths in that it satisfies Neyman-orthogonality and, thus, quasi-oracle efficiency. Finally, we propose a fully-parametric deep learning instantiation of our AU-learner.

Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner

TL;DR

The paper tackles quantifying aleatoric uncertainty in the treatment effect by focusing on the covariate-conditioned distribution of the treatment effect (CDTE), which is not point identifiable. It develops a novel AU-learner that performs Neyman-orthogonal, bias-corrected estimation of the Makarov bounds on the CDTE’s CDF/quantiles, with theoretical guarantees of quasi-oracle efficiency. A neural instantiation (AU-CNFs) provides a practical, scalable implementation that jointly estimates nuisance functions and bounds via conditional normalizing flows. Experiments on synthetic and semi-synthetic benchmarks demonstrate improved accuracy in estimating CDTE bounds, and a real-case lockdown analysis illustrates potential for informing personalized medical and policy decisions.

Abstract

Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the covariate-conditional level, namely, the conditional distribution of the treatment effect (CDTE). Unlike average causal quantities, the CDTE is not point identifiable without strong additional assumptions. As a remedy, we employ partial identification to obtain sharp bounds on the CDTE and thereby quantify the aleatoric uncertainty of the treatment effect. We then develop a novel, orthogonal learner for the bounds on the CDTE, which we call AU-learner. We further show that our AU-learner has several strengths in that it satisfies Neyman-orthogonality and, thus, quasi-oracle efficiency. Finally, we propose a fully-parametric deep learning instantiation of our AU-learner.

Paper Structure

This paper contains 37 sections, 4 theorems, 61 equations, 13 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

Let $\mathbb{P}$ denotes $\mathbb{P}(Z) = \mathbb{P}(X, A, Y)$, and let $\textcolor{BrickRed}{y^{{\overline{*}}}_{\mathcal{Y}}}(\cdot \mid x)$ and $\textcolor{BrickRed}{u^{{\underline{*}}}_{[\alpha, 1]}}(\cdot \mid x)$ be argmax/argmin sets of the convolutions $(\textcolor{BrickRed}{\mathbb{F}_1} \, where $I(X; \textcolor{BrickRed}{\eta}) = \mathbbm{1}\{(\textcolor{BrickRed}{{\mathbb{F}}_1} \, {\o

Figures (13)

  • Figure 1: Identification and estimation of the conditional distribution of the treatment effect (CDTE) ($=$our setting) compared to the (well-studied) identification and estimation of the CATE. In this paper, we focus specifically on the CDF of the CDTE, $\mathbb{P}(Y[1] - Y[0] \le \delta \mid x)$, shown in orange. Our main contribution relates to the estimation, shown in yellow. However, moving from CATE identification and estimation to our setting comes with important challenges: CATE (shown in green) is point identifiable but the CDTE is not (shown in blue); there is no closed-form expression of the target estimand in terms of nuisance functions and, because of that, CATE learners cannot be directly adapted for estimation; and CATE is an unconstrained target estimand whereas Makarov bounds (shown in ) are monotonous and contained in the interval $[0, 1]$.
  • Figure 2: Total uncertainty of the treatment effect can have different sources. Both upper and lower plots have the same total uncertainty but vastly different aleatoric and epistemic components. Yet, aleatoric uncertainty is non-identifiable (see Challenge ).
  • Figure 3: An example showing point non-identifiability of the distribution of the treatment effect based on the $i=7$-th instance of the semi-synthetic IHDP100 dataset hill2011bayesian. Shown are two data-generation models, indistinguishable in potential outcomes framework or RCTs, i. e., a monotone, $\mathcal{M}_{\text{m}}$, and an antitone, $\mathcal{M}_{\text{a}}$. For both models we also plot (a) conditional densities of potential outcomes, $\mathbb{P}(Y[a] = y \mid x_7)$ and conditional joint laws of potential outcomes, $\mathbb{P}(Y[0], Y[1] \mid x_7)$; and (b) corresponding CDFs of the CDTE (shown in blue), $\mathbb{F}(\delta \mid x_7) = \mathbb{P}(Y[1] - Y[0] \le \delta \mid x_7)$, together with Makarov bounds (shown in ) and point identifiable CATE (shown in green), $\tau(x_7) = \mathbb{E}(\Delta \mid x_7) \approx 2.342$. Non-identifiability of the CDTE is easy to see: Both data-generation models have the same conditional distributions of potential outcomes but different conditional joint laws and, thus, different CDTEs. The latter figures, (b), also demonstrate the bounds on the probability of benefit from treatment (a special case of Makarov bounds), $\mathbb{P}(Y[1] - Y[0] \le 0 \mid x_7) \in [0, 0.242]$. Hence, Makarov bounds are informative almost everywhere (except $\delta = \tau(x_7)$).
  • Figure 4: Comparison of learners for estimating Makarov bounds.
  • Figure 5: Results for synthetic experiments with varying size of training data, $n_{\text{train}}$, in 3 settings: normal, multi-modal, and exponential. Reported: mean out-sample rCRPS over 20 runs.
  • ...and 8 more figures

Theorems & Definitions (11)

  • Theorem 1: Efficient influence function for Makarov bounds
  • proof
  • Theorem 2: Neyman-orthogonality of AU-learner (informal)
  • Remark 1: Makarov bounds for mixed-type outcome zhang2024bounds
  • proof
  • Corollary 1: Efficient influence functions of the target risks
  • proof
  • Corollary 2: One-step bias-corrected estimator of the target risks
  • proof
  • Definition 1: Neyman-orthogonality foster2023orthogonalmorzywolek2023general
  • ...and 1 more