Calibrating hierarchical Bayesian domain inference for a proportion
Rayleigh Lei, Yajuan Si
TL;DR
The paper addresses calibrated Bayesian domain inference for proportions in SAE by extending FAB intervals from normally distributed outcomes to binary data and integrating with MRP to correct sample bias. It develops a two-step computation to construct FAB Wald, AC, and Wilson intervals for proportions, including an all-in penalty to improve behavior near the boundaries, and evaluates their frequentist coverage via simulation and a COVID-19 infection-rate application. Across simulations and the real data example, FAB intervals achieve improved domain-specific coverage relative to standard Bayesian credible or classical intervals, though at the cost of wider interval lengths. The work provides practical, calibrated interval tools for region- and subgroup-specific proportion estimates in SAE, with implications for targeted policy and resource allocation in public health and related fields.
Abstract
Small area estimation (SAE) improves estimates for local communities or groups, such as counties, neighborhoods, or demographic subgroups, when data are insufficient for each area. This is important for targeting local resources and policies, especially when national-level or large-area data mask variation at a more granular level. Researchers often fit hierarchical Bayesian models to stabilize SAE when data are sparse. Ideally, Bayesian procedures also exhibit good frequentist properties, as demonstrated by calibrated Bayes metrics. However, hierarchical Bayesian models tend to shrink domain estimates toward the overall mean and may produce credible intervals that do not maintain nominal coverage. Hoff et al. developed the Frequentist, but Assisted by Bayes (FAB) intervals for subgroup estimates with normally distributed outcomes. However, non-normally distributed data present new challenges, and multiple types of intervals have been proposed for estimating proportions. We examine domain inference with binary outcomes and extend FAB intervals to improve nominal coverage. We describe how to numerically compute FAB intervals for a proportion and evaluate their performance through repeated simulation studies. Leveraging multilevel regression and poststratification (MRP), we further refine SAE to correct for sample selection bias, construct the FAB intervals for MRP estimates and assess their repeated sampling properties. Finally, we apply the proposed inference methods to estimate COVID-19 infection rates across geographic and demographic subgroups. We find that the FAB intervals improve nominal coverage, at the cost of wider intervals.
