Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets

Yuxin Wang; Maresa Schröder; Dennis Frauen; Jonas Schweisthal; Konstantin Hess; Stefan Feuerriegel

Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets

Yuxin Wang, Maresa Schröder, Dennis Frauen, Jonas Schweisthal, Konstantin Hess, Stefan Feuerriegel

TL;DR

This work tackles constructing valid confidence intervals for the average treatment effect (ATE) when combining multiple observational datasets with differing confounding structures. It introduces prediction-powered inference (PPI), which couples a measure-of-fit from a large, potentially confounded dataset with a rectifier learned from a smaller, less biased source to shrink CI width while maintaining asymptotic validity. The method provides a coherent framework for both observational-only and RCT+observational settings, with theoretical guarantees and practical demonstrations on synthetic and medical data that show faithful coverage and substantially narrower CIs than naïve baselines. Overall, the approach enables more precise, reliable uncertainty quantification for multi-source causal evidence in medical contexts, and it accommodates flexible modeling choices, including pre-trained predictors.

Abstract

Constructing confidence intervals (CIs) for the average treatment effect (ATE) from patient records is crucial to assess the effectiveness and safety of drugs. However, patient records typically come from different hospitals, thus raising the question of how multiple observational datasets can be effectively combined for this purpose. In our paper, we propose a new method that estimates the ATE from multiple observational datasets and provides valid CIs. Our method makes little assumptions about the observational datasets and is thus widely applicable in medical practice. The key idea of our method is that we leverage prediction-powered inferences and thereby essentially `shrink' the CIs so that we offer more precise uncertainty quantification as compared to naïve approaches. We further prove the unbiasedness of our method and the validity of our CIs. We confirm our theoretical results through various numerical experiments. Finally, we provide an extension of our method for constructing CIs from combinations of experimental and observational datasets.

Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets

TL;DR

Abstract

Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (8)