Table of Contents
Fetching ...

Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation

Yoichi Chikahara, Kansei Ushiyama

TL;DR

This work tackles high-dimensional heterogeneous treatment effect estimation (CATE) under sample-selection bias. It introduces a differentiable Pareto-smoothed weighting (DPSW) framework that stabilizes IPW by replacing extreme weights with generalized Pareto distribution quantiles in an end-to-end differentiable manner, enabling integration with neural-network-based weighted representation learning (DR-CFR). The approach jointly learns encoders for instrumental variables, confounders, and adjustment variables while ensuring that adjustment information is preserved, and it demonstrates superior CATE estimation over baselines on semi-synthetic and synthetic data. By combining differentiable ranking with Pareto smoothing, DPSW achieves robust training and improved predictive performance in high-dimensional settings, providing practical impact for precision medicine and targeted interventions. Code is available at the provided GitHub repository.

Abstract

There is a growing interest in estimating heterogeneous treatment effects across individuals using their high-dimensional feature attributes. Achieving high performance in such high-dimensional heterogeneous treatment effect estimation is challenging because in this setup, it is usual that some features induce sample selection bias while others do not but are predictive of potential outcomes. To avoid losing such predictive feature information, existing methods learn separate feature representations using inverse probability weighting (IPW). However, due to their numerically unstable IPW weights, these methods suffer from estimation bias under a finite sample setup. To develop a numerically robust estimator by weighted representation learning, we propose a differentiable Pareto-smoothed weighting framework that replaces extreme weight values in an end-to-end fashion. Our experimental results show that by effectively correcting the weight values, our proposed method outperforms the existing ones, including traditional weighting schemes. Our code is available at https://github.com/ychika/DPSW.

Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation

TL;DR

This work tackles high-dimensional heterogeneous treatment effect estimation (CATE) under sample-selection bias. It introduces a differentiable Pareto-smoothed weighting (DPSW) framework that stabilizes IPW by replacing extreme weights with generalized Pareto distribution quantiles in an end-to-end differentiable manner, enabling integration with neural-network-based weighted representation learning (DR-CFR). The approach jointly learns encoders for instrumental variables, confounders, and adjustment variables while ensuring that adjustment information is preserved, and it demonstrates superior CATE estimation over baselines on semi-synthetic and synthetic data. By combining differentiable ranking with Pareto smoothing, DPSW achieves robust training and improved predictive performance in high-dimensional settings, providing practical impact for precision medicine and targeted interventions. Code is available at the provided GitHub repository.

Abstract

There is a growing interest in estimating heterogeneous treatment effects across individuals using their high-dimensional feature attributes. Achieving high performance in such high-dimensional heterogeneous treatment effect estimation is challenging because in this setup, it is usual that some features induce sample selection bias while others do not but are predictive of potential outcomes. To avoid losing such predictive feature information, existing methods learn separate feature representations using inverse probability weighting (IPW). However, due to their numerically unstable IPW weights, these methods suffer from estimation bias under a finite sample setup. To develop a numerically robust estimator by weighted representation learning, we propose a differentiable Pareto-smoothed weighting framework that replaces extreme weight values in an end-to-end fashion. Our experimental results show that by effectively correcting the weight values, our proposed method outperforms the existing ones, including traditional weighting schemes. Our code is available at https://github.com/ychika/DPSW.
Paper Structure (24 sections, 24 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 24 sections, 24 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Graphical model illustration of DRCFR method
  • Figure 2: Illustration of rank function $\textbf{r} = r(\textbf{w})$ (black) and differentiable rank functions $\textbf{r} = r_{\varepsilon}(\textbf{w})$ (orange and green): Here we take input vector $\textbf{w} = [w_1, 1, 2, 3]^{\top}$, vary $w_1$'s value and look at how its rank $r_1 \in \textbf{r}$ changes. When regularization parameter $\varepsilon \rightarrow 0$, $r_{\varepsilon}$ converges to $r$blondel2020fast.
  • Figure 3: Learned encoder parameter differences and test PEHEs on synthetic data: (a): value difference of $\mathbf{W}^1$ in encoder $\Gamma(\textbf{X})$; (b): value difference of $\mathbf{W}^1$ in encoder $\Delta(\textbf{X})$; (c): value difference of $\mathbf{W}^1$ in encoder $\Upsilon(\textbf{X})$; (d) test PEHEs. With TARNet, since it learns a single encoder, we computed all parameter value differences with weight matrix in same encoder.
  • Figure 4: Learned encoder parameter differences and test PEHEs on synthetic data: (a): value difference of $\mathbf{W}^1$ in encoder $\Gamma(\textbf{X})$; (b): value difference of $\mathbf{W}^1$ in encoder $\Delta(\textbf{X})$; (c): value difference of $\mathbf{W}^1$ in encoder $\Upsilon(\textbf{X})$; (d) test PEHEs. With TARNet, since it learns single encoder, we computed all parameter value differences with weight matrix in same encoder.