Table of Contents
Fetching ...

Proximity Matters: Local Proximity Preserved Balancing for Treatment Effect Estimation

Hao Wang, Zhichao Chen, Yuan Shen, Jiajun Fan, Zhaoran Liu, Degui Yang, Xinggao Liu, Haoxuan Li

TL;DR

This work tackles the challenge of estimating heterogeneous treatment effects from observational data by acknowledging that global distribution alignment often ignores local unit similarities. It introduces Proximity-aware Counterfactual Regression (PCR), which combines a Local Proximity Preservation Regularizer (LPR) with an Informative Subspace Projector (ISP) under a fused Gromov-Wasserstein OT framework to better balance treated and control groups while mitigating the curse of dimensionality. The approach yields improved CATE estimation as measured by PEHE, ATE, and ATT across semi-synthetic IHDP and ACIC benchmarks, with ablations validating the contributions of both LPR and ISP. The results suggest PCR’s practical impact for bias mitigation in causal inference tasks and point to future work on integrating normalizing flows and deploying the method in industrial settings such as recommender systems.

Abstract

Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In this study, we propose Proximity-aware Counterfactual Regression (PCR) to exploit proximity for representation balancing within the HTE estimation context. Specifically, we introduce a local proximity preservation regularizer based on optimal transport to depict the local proximity in discrepancy calculation. Furthermore, to overcome the curse of dimensionality that renders the estimation of discrepancy ineffective, exacerbated by limited data availability for HTE estimation, we develop an informative subspace projector, which trades off minimal distance precision for improved sample complexity. Extensive experiments demonstrate that PCR accurately matches units across different treatment groups, effectively mitigates treatment selection bias, and significantly outperforms competitors. Code is available at https://anonymous.4open.science/status/ncr-B697.

Proximity Matters: Local Proximity Preserved Balancing for Treatment Effect Estimation

TL;DR

This work tackles the challenge of estimating heterogeneous treatment effects from observational data by acknowledging that global distribution alignment often ignores local unit similarities. It introduces Proximity-aware Counterfactual Regression (PCR), which combines a Local Proximity Preservation Regularizer (LPR) with an Informative Subspace Projector (ISP) under a fused Gromov-Wasserstein OT framework to better balance treated and control groups while mitigating the curse of dimensionality. The approach yields improved CATE estimation as measured by PEHE, ATE, and ATT across semi-synthetic IHDP and ACIC benchmarks, with ablations validating the contributions of both LPR and ISP. The results suggest PCR’s practical impact for bias mitigation in causal inference tasks and point to future work on integrating normalizing flows and deploying the method in industrial settings such as recommender systems.

Abstract

Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In this study, we propose Proximity-aware Counterfactual Regression (PCR) to exploit proximity for representation balancing within the HTE estimation context. Specifically, we introduce a local proximity preservation regularizer based on optimal transport to depict the local proximity in discrepancy calculation. Furthermore, to overcome the curse of dimensionality that renders the estimation of discrepancy ineffective, exacerbated by limited data availability for HTE estimation, we develop an informative subspace projector, which trades off minimal distance precision for improved sample complexity. Extensive experiments demonstrate that PCR accurately matches units across different treatment groups, effectively mitigates treatment selection bias, and significantly outperforms competitors. Code is available at https://anonymous.4open.science/status/ncr-B697.
Paper Structure (37 sections, 7 theorems, 37 equations, 5 figures, 4 tables, 2 algorithms)

This paper contains 37 sections, 7 theorems, 37 equations, 5 figures, 4 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $\psi$ and $\phi$ be the representation mapping and factual outcome mapping, respectively; $\hat{\mathbb{W}}_\psi$ be the group discrepancy at a mini-batch level. With the probability of at least $1-\delta$, we have: where $\epsilon^{T=1}_\mathrm{F}$ and $\epsilon^{T=0}_\mathrm{F}$ are the expected errors of factual outcome estimation, $N$ is the batch size, $\sigma^2_Y$ is the variance of ou

Figures (5)

  • Figure 1: Overview of handling treatment selection bias with PCR. The red and blue colors signify the treated and untreated groups, respectively. (a) The treatment selection bias is illustrated through a distribution shift between treated ($X_1$) and untreated ($X_0$) units. The curves and scatters indicate the probability density functions and associated empirical distributions, respectively. (b) PCR reduces selection bias by aligning units from both treatment groups within a common representation space, denoted as $R = \psi(X)$. This alignment facilitates the generalization of the outcome mappings $\phi_1$ and $\phi_0$ across different groups.
  • Figure 2: The transport strategy (upper) and corresponding matrix visualization (down) in three HTE estimators: CFR cfr (left), ESCFR escfr (center) and Ours (right). Different scatter colors indicate different treatments.
  • Figure 3: Parameter sensitivity of the LPR module on the ACIC dataset, with focus on $\lambda$ (left and left center) and $\kappa$ (right center and right). The lines and shaded areas indicate the mean values and 90% confidence intervals, respectively.
  • Figure 4: Parameter sensitivity of the ISP module on the ACIC dataset, where $\mathrm{P}$ stands for the ratio of dimensionality reduction. $\kappa$ is set to 0.3 (left and left center) and 0.7 (right center and right). The lines and shaded areas indicate the mean values and 90% confidence intervals, respectively.
  • Figure 5: Running time of solving FGW using Algorithm \ref{['alg:fgw']} with different batch size ($\mathrm{N}$) and feature number ($\mathrm{D}$) on CPUs (a) and GPUs (b). Different patches represent various maximum iterations ($\ell_\mathrm{max}$) used to solve the FGW problem. The error bars around these lines indicate the 99.9% confidence intervals.

Theorems & Definitions (23)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Theorem 3.1
  • Definition 3.1
  • Definition A.1
  • Lemma A.1
  • Definition A.2
  • Definition A.3
  • ...and 13 more