Reliable Selection of Heterogeneous Treatment Effect Estimators
Jiayi Guo, Zijun Gao
TL;DR
This work tackles selecting the best heterogeneous treatment effect estimator without access to ground-truth ITEs by treating estimator selection as argmin inference under a multi-estimator setting. It introduces a ground-truth-free procedure based on a cross-fitted, exponentially weighted statistic with a two-layer sample-splitting scheme, and proves asymptotic control of the familywise error rate via a stability-based central limit theorem. Empirically, the method reduces false selections across ACIC 2016, IHDP, and Twins benchmarks while remaining effective as the number of candidates grows and when nuisance estimators are black-box models. The approach offers a principled, scalable way to compare HTE estimators in real-world data, with practical implications for model selection in personalized decision-making contexts.
Abstract
We study the problem of selecting the best heterogeneous treatment effect (HTE) estimator from a collection of candidates in settings where the treatment effect is fundamentally unobserved. We cast estimator selection as a multiple testing problem and introduce a ground-truth-free procedure based on a cross-fitted, exponentially weighted test statistic. A key component of our method is a two-way sample splitting scheme that decouples nuisance estimation from weight learning and ensures the stability required for valid inference. Leveraging a stability-based central limit theorem, we establish asymptotic familywise error rate control under mild regularity conditions. Empirically, our procedure provides reliable error control while substantially reducing false selections compared with commonly used methods across ACIC 2016, IHDP, and Twins benchmarks, demonstrating that our method is feasible and powerful even without ground-truth treatment effects.
