K-Fold Causal BART for CATE Estimation

Hugo Gobato Souto; Francisco Louzada Neto

K-Fold Causal BART for CATE Estimation

Hugo Gobato Souto, Francisco Louzada Neto

TL;DR

This work introduces K-Fold Causal BART, a two-component nonparametric approach for ATE and CATE estimation that combines Double/Orthogonal ML for ATE with a K-Fold T-Learner based on BART for CATE, augmented by a debiasing step. Across synthetic and semi-synthetic IHDP-based benchmarks, the method shows competitive performance in synthetic settings but does not achieve state-of-the-art results on IHDP, where ps-BART and BART($f_0,f_1$) often dominate for CATE and ATE. Key insights include the superior generalization of ps-BART relative to BCF, the sensitivity of BCF to treatment-heterogeneity, and the overconfidence of CATE uncertainty under low heterogeneity, along with the practical conclusion that a second K-Fold fold adds computational cost without improving results. The findings challenge some prevailing IHDP-era conclusions and emphasize the need for dataset-aware evaluation, broader benchmarks, and improved uncertainty quantification in causal inference methods.

Abstract

This research aims to propose and evaluate a novel model named K-Fold Causal Bayesian Additive Regression Trees (K-Fold Causal BART) for improved estimation of Average Treatment Effects (ATE) and Conditional Average Treatment Effects (CATE). The study employs synthetic and semi-synthetic datasets, including the widely recognized Infant Health and Development Program (IHDP) benchmark dataset, to validate the model's performance. Despite promising results in synthetic scenarios, the IHDP dataset reveals that the proposed model is not state-of-the-art for ATE and CATE estimation. Nonetheless, the research provides several novel insights: 1. The ps-BART model is likely the preferred choice for CATE and ATE estimation due to better generalization compared to the other benchmark models - including the Bayesian Causal Forest (BCF) model, which is considered by many the current best model for CATE estimation, 2. The BCF model's performance deteriorates significantly with increasing treatment effect heterogeneity, while the ps-BART model remains robust, 3. Models tend to be overconfident in CATE uncertainty quantification when treatment effect heterogeneity is low, 4. A second K-Fold method is unnecessary for avoiding overfitting in CATE estimation, as it adds computational costs without improving performance, 5. Detailed analysis reveals the importance of understanding dataset characteristics and using nuanced evaluation methods, 6. The conclusion of Curth et al. (2021) that indirect strategies for CATE estimation are superior for the IHDP dataset is contradicted by the results of this research. These findings challenge existing assumptions and suggest directions for future research to enhance causal inference methodologies.

K-Fold Causal BART for CATE Estimation

TL;DR

) often dominate for CATE and ATE. Key insights include the superior generalization of ps-BART relative to BCF, the sensitivity of BCF to treatment-heterogeneity, and the overconfidence of CATE uncertainty under low heterogeneity, along with the practical conclusion that a second K-Fold fold adds computational cost without improving results. The findings challenge some prevailing IHDP-era conclusions and emphasize the need for dataset-aware evaluation, broader benchmarks, and improved uncertainty quantification in causal inference methods.

Abstract

Paper Structure (33 sections, 36 equations, 1 figure, 18 tables)

This paper contains 33 sections, 36 equations, 1 figure, 18 tables.

Key Words
Introduction
Origin of Causal Inference and Early Works
Problem Statement and Common Notation
Main Approaches to solve the Problem
a) Ordinary Linear Regression and OLS
b) Lasso Method
a) Single (S)-Learner
Aim of this Thesis and Thesis Structure
Literature Review
Main Models
Linear Regression and LASSO Method
CausalRF
Neural Network Models
BART models
...and 18 more sections

Figures (1)

Figure 1: Overview of five potential model architectures for single-step estimation of nuisance parameters, incorporating various degrees of shared information between tasks. The representation layers are depicted in gray, while the task-specific layers are illustrated in blue pmlr-v130-curth21a. Here $\hat{\mu_0}$ and $\hat{\mu_1}$ denote the $\hat{Y}_0$ and $\hat{Y}_1$ respectively.

K-Fold Causal BART for CATE Estimation

TL;DR

Abstract

K-Fold Causal BART for CATE Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (1)