Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation

Yoichi Chikahara; Kansei Ushiyama

Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation

Yoichi Chikahara, Kansei Ushiyama

TL;DR

This work tackles high-dimensional heterogeneous treatment effect estimation (CATE) under sample-selection bias. It introduces a differentiable Pareto-smoothed weighting (DPSW) framework that stabilizes IPW by replacing extreme weights with generalized Pareto distribution quantiles in an end-to-end differentiable manner, enabling integration with neural-network-based weighted representation learning (DR-CFR). The approach jointly learns encoders for instrumental variables, confounders, and adjustment variables while ensuring that adjustment information is preserved, and it demonstrates superior CATE estimation over baselines on semi-synthetic and synthetic data. By combining differentiable ranking with Pareto smoothing, DPSW achieves robust training and improved predictive performance in high-dimensional settings, providing practical impact for precision medicine and targeted interventions. Code is available at the provided GitHub repository.

Abstract

There is a growing interest in estimating heterogeneous treatment effects across individuals using their high-dimensional feature attributes. Achieving high performance in such high-dimensional heterogeneous treatment effect estimation is challenging because in this setup, it is usual that some features induce sample selection bias while others do not but are predictive of potential outcomes. To avoid losing such predictive feature information, existing methods learn separate feature representations using inverse probability weighting (IPW). However, due to their numerically unstable IPW weights, these methods suffer from estimation bias under a finite sample setup. To develop a numerically robust estimator by weighted representation learning, we propose a differentiable Pareto-smoothed weighting framework that replaces extreme weight values in an end-to-end fashion. Our experimental results show that by effectively correcting the weight values, our proposed method outperforms the existing ones, including traditional weighting schemes. Our code is available at https://github.com/ychika/DPSW.

Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation

TL;DR

Abstract

Paper Structure (24 sections, 24 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 24 sections, 24 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
Problem Setup
Weighted Representation Learning
Proposed Method
Overview
Weight Correction via Pareto Smoothing
GPD Parameter Fitting
Weight Replacement with GPD Quantiles
Non-Differentiable Procedures
Making Pareto Smoothing Differentiable
Differentiable Approximation
Reformulation of GPD Parameter Estimators
Overall Algorithm
Experiments
...and 9 more sections

Figures (4)

Figure 1: Graphical model illustration of DRCFR method
Figure 2: Illustration of rank function $\textbf{r} = r(\textbf{w})$ (black) and differentiable rank functions $\textbf{r} = r_{\varepsilon}(\textbf{w})$ (orange and green): Here we take input vector $\textbf{w} = [w_1, 1, 2, 3]^{\top}$, vary $w_1$'s value and look at how its rank $r_1 \in \textbf{r}$ changes. When regularization parameter $\varepsilon \rightarrow 0$, $r_{\varepsilon}$ converges to $r$blondel2020fast.
Figure 3: Learned encoder parameter differences and test PEHEs on synthetic data: (a): value difference of $\mathbf{W}^1$ in encoder $\Gamma(\textbf{X})$; (b): value difference of $\mathbf{W}^1$ in encoder $\Delta(\textbf{X})$; (c): value difference of $\mathbf{W}^1$ in encoder $\Upsilon(\textbf{X})$; (d) test PEHEs. With TARNet, since it learns a single encoder, we computed all parameter value differences with weight matrix in same encoder.
Figure 4: Learned encoder parameter differences and test PEHEs on synthetic data: (a): value difference of $\mathbf{W}^1$ in encoder $\Gamma(\textbf{X})$; (b): value difference of $\mathbf{W}^1$ in encoder $\Delta(\textbf{X})$; (c): value difference of $\mathbf{W}^1$ in encoder $\Upsilon(\textbf{X})$; (d) test PEHEs. With TARNet, since it learns single encoder, we computed all parameter value differences with weight matrix in same encoder.

Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation

TL;DR

Abstract

Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)