Table of Contents
Fetching ...

Efficient adjustment for complex covariates: Gaining efficiency with DOPE

Alexander Mangulad Christgau, Anton Rask Lundborg, Niels Richard Hansen

TL;DR

The paper develops a generalized covariate adjustment framework for efficient ATE estimation with complex covariates by reasoning about informative descriptions of covariate information rather than fixed graphs. It introduces DOPE, a Debiased Outcome-adapted Propensity Estimator, which learns an outcome-focused representation and couples nuisance estimation through that representation to achieve robustness and efficiency gains over standard AIPW, especially when covariates strongly predict treatment. The authors prove information bounds implying that using outcome-sufficient descriptions minimizes asymptotic variance and provide a delta-method analysis for representation-induced error, with DOPE shown to be asymptotically normal and variance-consistent. Empirically, DOPE improves finite-sample performance in simulations with single-index structures and demonstrates competitive, stable adjusted effects in NHANES data, including scenarios with extreme propensity scores. The framework supports non-Euclidean covariates (texts/images) and high-dimensional settings, offering practical guidance for efficient ATE estimation in observational studies.

Abstract

Covariate adjustment is a ubiquitous method used to estimate the average treatment effect (ATE) from observational data. Assuming a known graphical structure of the data generating model, recent results give graphical criteria for optimal adjustment, which enables efficient estimation of the ATE. However, graphical approaches are challenging for high-dimensional and complex data, and it is not straightforward to specify a meaningful graphical model of non-Euclidean data such as texts. We propose a new framework that accommodates adjustment for any subset of information expressed by the covariates, and we show that the information that is minimally sufficient for prediction of the outcome given the treatment is also most efficient for adjustment. Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estimator (DOPE) for efficient estimation of the ATE, and we provide asymptotic results for DOPE under general conditions. Compared to the augmented inverse propensity weighted (AIPW) estimator, DOPE can retain its efficiency even when the covariates are highly predictive of treatment. We illustrate this with a single-index model, and with an implementation of DOPE based on neural networks, we demonstrate its performance on simulated and real data. Our results show that DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.

Efficient adjustment for complex covariates: Gaining efficiency with DOPE

TL;DR

The paper develops a generalized covariate adjustment framework for efficient ATE estimation with complex covariates by reasoning about informative descriptions of covariate information rather than fixed graphs. It introduces DOPE, a Debiased Outcome-adapted Propensity Estimator, which learns an outcome-focused representation and couples nuisance estimation through that representation to achieve robustness and efficiency gains over standard AIPW, especially when covariates strongly predict treatment. The authors prove information bounds implying that using outcome-sufficient descriptions minimizes asymptotic variance and provide a delta-method analysis for representation-induced error, with DOPE shown to be asymptotically normal and variance-consistent. Empirically, DOPE improves finite-sample performance in simulations with single-index structures and demonstrates competitive, stable adjusted effects in NHANES data, including scenarios with extreme propensity scores. The framework supports non-Euclidean covariates (texts/images) and high-dimensional settings, offering practical guidance for efficient ATE estimation in observational studies.

Abstract

Covariate adjustment is a ubiquitous method used to estimate the average treatment effect (ATE) from observational data. Assuming a known graphical structure of the data generating model, recent results give graphical criteria for optimal adjustment, which enables efficient estimation of the ATE. However, graphical approaches are challenging for high-dimensional and complex data, and it is not straightforward to specify a meaningful graphical model of non-Euclidean data such as texts. We propose a new framework that accommodates adjustment for any subset of information expressed by the covariates, and we show that the information that is minimally sufficient for prediction of the outcome given the treatment is also most efficient for adjustment. Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estimator (DOPE) for efficient estimation of the ATE, and we provide asymptotic results for DOPE under general conditions. Compared to the augmented inverse propensity weighted (AIPW) estimator, DOPE can retain its efficiency even when the covariates are highly predictive of treatment. We illustrate this with a single-index model, and with an implementation of DOPE based on neural networks, we demonstrate its performance on simulated and real data. Our results show that DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
Paper Structure (39 sections, 15 theorems, 170 equations, 13 figures, 2 tables, 2 algorithms)

This paper contains 39 sections, 15 theorems, 170 equations, 13 figures, 2 tables, 2 algorithms.

Key Result

Lemma 3.1

Fix a distribution $P\in \mathcal{P}$ and let $\mathcal{Z}_1\subseteq \mathcal{Z}_2$ be $\sigma$-algebras such that $Y\,{\perp \! \! \! \perp}_P\, \mathcal{Z}_2 \,|\, T,\mathcal{Z}_1$. Then it always holds that where for each $t\in \mathbb{T}$, Moreover, if $\mathcal{Z}_2$ is a description of $\mathbf{W}$ then $\mathcal{Z}_1$ is $P$-valid if and only if $\mathcal{Z}_2$ is $P$-valid.

Figures (13)

  • Figure 1: The covariate $\mathbf{W}$ can have a complex data structure, even if the information it represents is structured and can be categorized into components that influence treatment and outcome separately.
  • Figure 2: The $\sigma$-algebra $\mathcal{Q}$ given in Definition \ref{['def:OutcomeAlgebras']} as a description of $\mathbf{W}$, which may include strictly less information than $\sigma(\mathbf{W})$ depending on $\mathcal{P}$.
  • Figure 3: Root mean squared errors (RMSE) for various estimators of $\mu_1$ plotted against sample size. Each data point is an average over 900 datasets. The bars around each point correspond to asymptotic $95\%$ confidence intervals based on the CLT. The dashed lines are only included as visual aids to make it easier to spot trends across sample sizes. For this plot, the outcome regression was fitted separately for each stratum $T=0$ and $T=1$.
  • Figure 4: Root mean squared errors (RMSE) for various estimators of $\mu_1$ plotted against sample size. Each data point is an average over 900 datasets. The bars around each point correspond to asymptotic $95\%$ confidence intervals based on the CLT. The dashed lines are only included as visual aids to make it easier to spot trends across sample sizes. For this plot, the outcome regression was fitted jointly onto $(T,\mathbf{W})$.
  • Figure 5: $95\%$ nominal confidence intervals for the adjusted mean $\mu_1$ in the experiment described in Section \ref{['sec:simulationinference']}. The text above each collection of intervals indicates the coverage rate out of the $100$ intervals. The text below indicates the median length of the intervals.
  • ...and 8 more figures

Theorems & Definitions (30)

  • Remark 2.1
  • Definition 2.2
  • Example 2.3: Comparison with adjustment sets in causal DAGs
  • Example 2.4
  • Lemma 3.1: Deletion of overadjustment
  • Definition 3.2
  • Theorem 3.3
  • Corollary 3.4
  • Remark 3.5
  • Proposition 3.6
  • ...and 20 more