Planning for gold: Hypothesis screening with split samples for valid powerful testing in matched observational studies
William Bekerman, Abhinandan Dalal, Carlo del Ninno, Dylan S. Small
TL;DR
This work develops a data-splitting design for observational studies to screen hypotheses in a planning sample while preserving valid inference in an analysis sample. Leveraging Rosenbaum’s sensitivity framework and the sensitivity value, the authors introduce Sens-Val, a bootstrap-assisted procedure that screens outcomes with robustness to unmeasured confounding and constructs predictive intervals for analysis-stage testing. Theoretical results (including Edgeworth expansions and local-power analyses) justify the finite-sample validity and power advantages, especially under higher bias levels, and simulations show Sens-Val often outperforms naive screening and full-sample corrections. The Bangladesh floods application demonstrates practical gains in identifying robust health, water, and economic effects under varying levels of unmeasured confounding, while maintaining FWER control. The approach offers flexible extensions to full matching, cross-screening, and alternative error-rate metrics, making it a pragmatic tool for causal inference in observational studies with many outcomes.
Abstract
Observational studies are valuable tools for inferring causal effects in the absence of controlled experiments. However, these studies may be biased due to the presence of some relevant, unmeasured set of covariates. One approach to mitigate this concern is to identify hypotheses likely to be more resilient to hidden biases by splitting the data into a planning sample for designing the study and an analysis sample for making inferences. We devise a powerful and flexible method for selecting hypotheses in the planning sample when an unknown number of outcomes are affected by the treatment, allowing researchers to gain the benefits of exploratory analysis and still conduct powerful inference under concerns of unmeasured confounding. We investigate the theoretical properties of our method and conduct extensive simulations that demonstrate pronounced benefits, especially at higher levels of allowance for unmeasured confounding. Finally, we demonstrate our method in an observational study of the multi-dimensional impacts of a devastating flood in Bangladesh.
