COSTAR: Improved Temporal Counterfactual Estimation with Self-Supervised Learning
Chuizheng Meng, Yihe Dong, Sercan Ö. Arık, Yan Liu, Tomas Pfister
TL;DR
COSTAR tackles temporal counterfactual outcome estimation under time-varying confounding and distribution shifts by learning expressive history representations through self-supervised learning and a Transformer encoder that jointly models temporal and feature interactions. It introduces component-wise contrastive losses, a non-autoregressive future predictor, and an unsupervised domain adaptation perspective to bound transfer errors, enabling effective zero-shot and data-efficient transfer. Empirical results on synthetic and real-world datasets show COSTAR outperforms baselines in estimation accuracy and cross-domain generalization, while ablations highlight the importance of the encoder design and SSL losses. The approach offers practical benefits for decision-making in domains where RCTs are costly or impractical and data from target populations are scarce.
Abstract
Estimation of temporal counterfactual outcomes from observed history is crucial for decision-making in many domains such as healthcare and e-commerce, particularly when randomized controlled trials (RCTs) suffer from high cost or impracticality. For real-world datasets, modeling time-dependent confounders is challenging due to complex dynamics, long-range dependencies and both past treatments and covariates affecting the future outcomes. In this paper, we introduce Counterfactual Self-Supervised Transformer (COSTAR), a novel approach that integrates self-supervised learning for improved historical representations. We propose a component-wise contrastive loss tailored for temporal treatment outcome observations and explain its effectiveness from the view of unsupervised domain adaptation. COSTAR yields superior performance in estimation accuracy and generalization to out-of-distribution data compared to existing models, as validated by empirical results on both synthetic and real-world datasets.
