Table of Contents
Fetching ...

Parameterising the effect of a continuous treatment using average derivative effects

Oliver J. Hines, Karla Diaz-Ordaz, Stijn Vansteelandt

TL;DR

This paper develops a unifying framework for causal effects with continuous treatments by focusing on weighted average derivative effects (ADEs) and their Riesz representers. It defines a class of estimands 𝓡 that connects weighted ADEs and weighted ATEs, derives optimally efficient representations, and shows how, under mild conditions, the least-squares estimands ψ and Ψ emerge as interpretable, robust targets. It then provides density-free, efficient estimators for ψ and Ψ using one-step influence-function approaches with cross-fitting, and demonstrates performance in simulations and a Warfarin-dose analysis. The work enables practical, model-agnostic inference for continuous treatments and offers a principled path to compare weighted causal effects across populations, with favorable finite-sample properties and broad applicability in biostatistics and epidemiology.

Abstract

The average treatment effect (ATE) is commonly used to quantify the main effect of a binary treatment on an outcome. Extensions to continuous treatments are usually based on the dose-response curve or shift interventions, but both require strong overlap conditions and the resulting curves may be difficult to summarise. We focus instead on average derivative effects (ADEs) that are scalar estimands related to infinitesimal shift interventions requiring only local overlap assumptions. ADEs, however, are rarely used in practice because their estimation usually requires estimating conditional density functions. By characterising the Riesz representers of weighted ADEs, we propose a new class of estimands that provides a unified view of weighted ADEs/ATEs when the treatment is continuous/binary. We derive the estimand in our class that minimises the nonparametric efficiency bound, thereby extending optimal weighting results from the binary treatment literature to the continuous setting. We develop efficient estimators for two weighted ADEs that avoid density estimation and are amenable to modern machine learning methods, which we evaluate in simulations and an applied analysis of Warfarin dosage effects.

Parameterising the effect of a continuous treatment using average derivative effects

TL;DR

This paper develops a unifying framework for causal effects with continuous treatments by focusing on weighted average derivative effects (ADEs) and their Riesz representers. It defines a class of estimands 𝓡 that connects weighted ADEs and weighted ATEs, derives optimally efficient representations, and shows how, under mild conditions, the least-squares estimands ψ and Ψ emerge as interpretable, robust targets. It then provides density-free, efficient estimators for ψ and Ψ using one-step influence-function approaches with cross-fitting, and demonstrates performance in simulations and a Warfarin-dose analysis. The work enables practical, model-agnostic inference for continuous treatments and offers a principled path to compare weighted causal effects across populations, with favorable finite-sample properties and broad applicability in biostatistics and epidemiology.

Abstract

The average treatment effect (ATE) is commonly used to quantify the main effect of a binary treatment on an outcome. Extensions to continuous treatments are usually based on the dose-response curve or shift interventions, but both require strong overlap conditions and the resulting curves may be difficult to summarise. We focus instead on average derivative effects (ADEs) that are scalar estimands related to infinitesimal shift interventions requiring only local overlap assumptions. ADEs, however, are rarely used in practice because their estimation usually requires estimating conditional density functions. By characterising the Riesz representers of weighted ADEs, we propose a new class of estimands that provides a unified view of weighted ADEs/ATEs when the treatment is continuous/binary. We derive the estimand in our class that minimises the nonparametric efficiency bound, thereby extending optimal weighting results from the binary treatment literature to the continuous setting. We develop efficient estimators for two weighted ADEs that avoid density estimation and are amenable to modern machine learning methods, which we evaluate in simulations and an applied analysis of Warfarin dosage effects.

Paper Structure

This paper contains 29 sections, 9 theorems, 104 equations, 3 figures, 3 tables.

Key Result

Theorem 1

Let $F(a|x)$ be the distribution function of $A$ given $X=x$ and assume that the support of $A$ given $X=x$ is an open (possibly unbounded) interval. For $\alpha \in \mathcal{R}$ define the weight For all differentiable functions $f \in \mathcal{H}$, $\langle f, \alpha \rangle = E\{w(A,X)f^\prime(A,X)\}$. Proof in Appendix main_theorem_proof.

Figures (3)

  • Figure 1: Sample size against Bias (plots A,D,G), variance (plots B,E,H) and 95% Wald CI coverage (plots C,F,I) for $\hat{\Psi}$ (plots A,B,C), $\hat{\psi}$ using the direct approach (plots D,E,F), and $\hat{\psi}$ using the quasi-oracle approach (plots G,H,I). We highlight that the y-axis limits change between rows of the bias and variance plots. The black horizontal lines represents zero bias, zero variance, and 95% coverage respectively.
  • Figure 2: Least squares estimand weights approximated using the location-scale procedure described in Appendix \ref{['weight_approximations']}. This procedure uses estimates of the conditional mean and variance of $A$ given $X$, which are obtained using the algorithms in Section \ref{['proposed_algos']} using the Super Learner for model fitting. Rows and columns refer to different algorithms as labeled.
  • Figure 3: Least squares estimand weights approximated using the location-scale procedure described in Supplement \ref{['weight_approximations']}. This procedure uses estimates of the conditional mean and variance or $A$ given $X$, which are obtained using the algorithms in Section \ref{['proposed_algos']} and a discrete super learner for model fitting. Rows and columns refer to different algorithms as labeled.

Theorems & Definitions (27)

  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 1
  • Lemma 1.1
  • Remark 4
  • Example 1: Average derivative effect (ADE)
  • Example 2: Density weighted ADE
  • Example 3: Average dose-response derivative
  • Example 4: Least Squares Estimands
  • ...and 17 more