Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing

Lokesh Nagalapatti; Akshay Iyer; Abir De; Sunita Sarawagi

Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing

Lokesh Nagalapatti, Akshay Iyer, Abir De, Sunita Sarawagi

TL;DR

The paper tackles Individualized Continuous Treatment Effect (ICTE) estimation from observational data under confounding by introducing GIKS, a model-agnostic framework that augments training with independently sampled treatments and inferred counterfactuals. It combines Gradient Interpolated (GI) losses for near-treatment points and Gaussian Process kernel smoothing (KS) losses for distant points, with GP-derived variances weighting counterfactual supervision, optimizing a final objective that encompasses the factual loss plus counterfactual terms. Empirically, GIKS yields statistically significant improvements over six state-of-the-art baselines on five benchmarks, and HSIC analyses indicate reduced X-T dependence in augmented data, reflecting improved distribution alignment. The approach is versatile across base architectures and supports practical applications like algorithmic recourse in medical contexts, offering a principled path to better ICTE accuracy under observational data constraints.

Abstract

We address the Individualized continuous treatment effect (ICTE) estimation problem where we predict the effect of any continuous-valued treatment on an individual using observational data. The main challenge in this estimation task is the potential confounding of treatment assignment with an individual's covariates in the training data, whereas during inference ICTE requires prediction on independently sampled treatments. In contrast to prior work that relied on regularizers or unstable GAN training, we advocate the direct approach of augmenting training individuals with independently sampled treatments and inferred counterfactual outcomes. We infer counterfactual outcomes using a two-pronged strategy: a Gradient Interpolation for close-to-observed treatments, and a Gaussian Process based Kernel Smoothing which allows us to downweigh high variance inferences. We evaluate our method on five benchmarks and show that our method outperforms six state-of-the-art methods on the counterfactual estimation error. We analyze the superior performance of our method by showing that (1) our inferred counterfactual responses are more accurate, and (2) adding them to the training data reduces the distributional distance between the confounded training distribution and test distribution where treatment is independent of covariates. Our proposed method is model-agnostic and we show that it improves ICTE accuracy of several existing models.

Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing

TL;DR

Abstract

Paper Structure (46 sections, 17 equations, 7 figures, 13 tables, 1 algorithm)

This paper contains 46 sections, 17 equations, 7 figures, 13 tables, 1 algorithm.

Introduction
Problem Formulation
GIKS: Our Proposed Approach
Gradient Interpolated Inferred Counterfactual Outcomes
Kernel Smoothed Inferred Counterfactual Outcomes
Gaussian Process for Estimating Counterfactual Response
Estimating Parameters
Fixing GI+GP Parameters
Estimation of $\Phi,\eta$
Related Work
Discrete Treatment Effect Estimation (DTE)
Continuous Treatment Effect Estimation
Experiments
Dataset
Methods
...and 31 more sections

Figures (7)

Figure 1: Losses on Counterfactuals
Figure 2: Training Dosage distribution
Figure 3: IHDP Individualised Dose-Response Function
Figure 4: This Figure shows the ADRF curves for all the baselines and GIKS.
Figure 5: This figure depicts the analysis described in Section \ref{['sec:theory']}. In the training dataset, the covariates X and treatments $T$ are jointly Gaussian with mean 0 and covariance $10.80.81$. The goal is to synthesize an outcome $\hat{y}(x_i, t^{\text{CF}}_i)$ such that the error in the synthesis is less than $\tau$. GIKS achieves this when the underlying observational dataset has at least one sample from the shaded green region in the middle. The shaded blue (yellow) regions in the figure represent the band of width $\frac{\tau}{ \tau'}$ ($\frac{\tau}{\delta'}$) around $x_i$ ($t^{\text{CF}}_i$) that are considered in $D_{\text{NN}} (t^{\text{CF}}_i)$.
...and 2 more figures

Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing

TL;DR

Abstract

Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing

Authors

TL;DR

Abstract

Table of Contents

Figures (7)