Data-Adaptive Integration With Summary Data

Kosuke Morikawa; Sho Komukai; Satoshi Hattori

Data-Adaptive Integration With Summary Data

Kosuke Morikawa, Sho Komukai, Satoshi Hattori

TL;DR

A generalized entropy-balancing integration strategy is developed that calibrates external moments to the internal covariate distribution, explicitly permitting a biased external sample, and is implemented in the R package daisy.

Abstract

Combining an internal individual-level study with readily available external summary statistics promises major efficiency gains at minimal additional cost, yet heterogeneity between sources can bias estimates for the internal target population. We develop a generalized entropy-balancing integration strategy that calibrates external moments to the internal covariate distribution, explicitly permitting a biased external sample. Our estimator of the internal-population mean is doubly robust: it remains consistent when either the outcome-regression model or the entropy-balancing modelis correctly specified. When multiple balancing specifications are plausible, we introduce a data-adaptive selection rule. We also provide easy-to-compute, fully estimable diagnostics-based on the Mahalanobis distance and the Pearson chi-square divergence-that pinpoint when integration is guaranteed to strictly outperform the internal sample mean. The approach is implemented in the R package daisy. Simulations and an application to nationwide public-access defibrillation records in Japan demonstrate meaningful precision gains while maintaining bias control under distributional shift.

Data-Adaptive Integration With Summary Data

TL;DR

Abstract

Data-Adaptive Integration With Summary Data

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)