Table of Contents
Fetching ...

Learning Representations of Instruments for Partial Identification of Treatment Effects

Jonas Schweisthal, Dennis Frauen, Maresa Schröder, Konstantin Hess, Niki Kilbertus, Stefan Feuerriegel

TL;DR

This paper tackles estimating CATE from observational data when unconfoundedness fails by leveraging complex instrumental variables to obtain partial identification bounds. It introduces a novel framework that maps high-dimensional Z to a discrete representation phi(Z) and derives valid, closed-form population bounds on the CATE, b^−(x) and b^+(x), which are tightened by optimally selecting phi. A two-stage neural approach learns tight, variance-conscious bounds: first estimating nuisance functions, then learning the discrete latent phi(Z) via a Gumbel-softmax discretization and a loss that balances bound width against estimation stability. The method is theoretically justified and empirically validated on Mendelian randomization-like simulations, demonstrating 100% coverage and tighter bounds than naive discretization, with robustness to the number of partitions. Overall, this work provides a practical, non-parametric path to using complex IVs (including genetic data, text, images) for reliable causal decision-making under partial identification.

Abstract

Reliable estimation of treatment effects from observational data is important in many disciplines such as medicine. However, estimation is challenging when unconfoundedness as a standard assumption in the causal inference literature is violated. In this work, we leverage arbitrary (potentially high-dimensional) instruments to estimate bounds on the conditional average treatment effect (CATE). Our contributions are three-fold: (1) We propose a novel approach for partial identification through a mapping of instruments to a discrete representation space so that we yield valid bounds on the CATE. This is crucial for reliable decision-making in real-world applications. (2) We derive a two-step procedure that learns tight bounds using a tailored neural partitioning of the latent instrument space. As a result, we avoid instability issues due to numerical approximations or adversarial training. Furthermore, our procedure aims to reduce the estimation variance in finite-sample settings to yield more reliable estimates. (3) We show theoretically that our procedure obtains valid bounds while reducing estimation variance. We further perform extensive experiments to demonstrate the effectiveness across various settings. Overall, our procedure offers a novel path for practitioners to make use of potentially high-dimensional instruments (e.g., as in Mendelian randomization).

Learning Representations of Instruments for Partial Identification of Treatment Effects

TL;DR

This paper tackles estimating CATE from observational data when unconfoundedness fails by leveraging complex instrumental variables to obtain partial identification bounds. It introduces a novel framework that maps high-dimensional Z to a discrete representation phi(Z) and derives valid, closed-form population bounds on the CATE, b^−(x) and b^+(x), which are tightened by optimally selecting phi. A two-stage neural approach learns tight, variance-conscious bounds: first estimating nuisance functions, then learning the discrete latent phi(Z) via a Gumbel-softmax discretization and a loss that balances bound width against estimation stability. The method is theoretically justified and empirically validated on Mendelian randomization-like simulations, demonstrating 100% coverage and tighter bounds than naive discretization, with robustness to the number of partitions. Overall, this work provides a practical, non-parametric path to using complex IVs (including genetic data, text, images) for reliable causal decision-making under partial identification.

Abstract

Reliable estimation of treatment effects from observational data is important in many disciplines such as medicine. However, estimation is challenging when unconfoundedness as a standard assumption in the causal inference literature is violated. In this work, we leverage arbitrary (potentially high-dimensional) instruments to estimate bounds on the conditional average treatment effect (CATE). Our contributions are three-fold: (1) We propose a novel approach for partial identification through a mapping of instruments to a discrete representation space so that we yield valid bounds on the CATE. This is crucial for reliable decision-making in real-world applications. (2) We derive a two-step procedure that learns tight bounds using a tailored neural partitioning of the latent instrument space. As a result, we avoid instability issues due to numerical approximations or adversarial training. Furthermore, our procedure aims to reduce the estimation variance in finite-sample settings to yield more reliable estimates. (3) We show theoretically that our procedure obtains valid bounds while reducing estimation variance. We further perform extensive experiments to demonstrate the effectiveness across various settings. Overall, our procedure offers a novel path for practitioners to make use of potentially high-dimensional instruments (e.g., as in Mendelian randomization).

Paper Structure

This paper contains 18 sections, 4 theorems, 45 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $\phi: \mathcal{Z} \xrightarrow{} \{0, 1, \ldots, k\}$ be an arbitrary mapping from the high-dimensional instrument $Z$ to a discrete representation. We define Then, under Assumptions ass:consistency, ass:exclsion, and ass:independence, the CATE $\tau(x)$ is bounded by with where

Figures (5)

  • Figure 1: Overview of the IV setting. We consider complex instruments $Z$ (e.g., gene data, text, images), observed confounders $X$, unobserved confounders $U$, a binary treatment $A$, and an outcome $Y$.
  • Figure 2: Leveraging complex instruments for partial identification of the CATE through discrete representations of $Z$. Naive discretization on the IV input space leads to wide, and thus non-informative, bounds. Our method learns a latent representation $\phi(Z)$ to yield tight bounds.
  • Figure 3: Workflow of the second stage of our method for calculating bounds on the CATE: The representation network $\phi_\theta$ learns discrete latent representations of the complex $Z$ (e.g., continuous or high-dimensional). By employing the pre-trained $\hat{\mu}$, $\hat{\pi}$, and $\hat{\eta}$, we can directly calculate the nuisance estimates conditional on the latent representation $\phi(z)$ by using Eq. \ref{['eq:estimates_nuisance_mu']} and Eq. \ref{['eq:estimates_nuisance_phi']} to yield the bounds.
  • Figure 4: Datasets 1 and 2: Estimated bounds on the CATE. Shown: mean $\pm$ sd over 5 runs for different number of discretizations $k$. Left: Dataset 1 with a simple $\pi(x, z)$. Right: Dataset 2 with a complex $\pi(x, z)$.
  • Figure 5: Dataset 3 (high-dimensional): Average bound width. Sensitivity analysis wrt. to the number of instruments $k$ where we show the average bound width and estimation variance over 5 runs.

Theorems & Definitions (10)

  • Theorem 1: Bounds for arbitrary instrument discretizations
  • proof
  • Lemma 1: Tightness-bias-variance tradeoff
  • proof
  • Theorem 2: Asymptotic distributions of estimators
  • proof
  • Lemma 2: swanson2018partialschweisthal2024meta-learners
  • proof : Proof of Theorem \ref{['thrm:bounds_phi']}
  • proof
  • proof