Table of Contents
Fetching ...

Automatic Variance Adjustment for Small Area Estimation

Jon Wakefield, Jitong Jiang, Yunhan Wu

TL;DR

This work tackles the instability and non-existence of variance estimates in small area estimation under DHS-like stratified two-stage sampling in LMICs. It introduces Automatic Variance Adjustment via phantom prior augmentation, extending from simple random sampling to complex survey designs and exponential-family settings, and implements it in surveyPrev to enable automated, design-consistent variance fixes. Through simulations modeled on Zambia DHS data and a real application to wasting in children, the approach improves interval coverage and precision in Admin-2 estimates while enabling reliable ranking and spatial borrowing via BYM2 structures. The method offers a practical, scalable solution for producing trustworthy prevalence maps and domain-level inferences in data-sparse settings, with broad applicability across LMIC health indicators.

Abstract

Small area estimation (SAE) is a common endeavor and is used in a variety of disciplines. In low- and middle-income countries (LMICs), in which household surveys provide the most reliable and timely source of data, SAE is vital for highlighting disparities in health and demographic indicators. Weighted estimators are ideal for inference, but for fine geographical partitions in which there are insufficient data, SAE models are required. The most common approach is Fay-Herriot area-level modeling in which the data requirements are a weighted estimate and an associated variance estimate. The latter can be undefined or unstable when data are sparse and so we propose a principled modification which is based on augmenting the available data with a prior sample from a hypothetical survey. This adjustment is generally available, respects the design and is simple to implement. We examine the empirical properties of the adjustment through simulation and illustrate its use with wasting data from a 2018 Zambian Demographic and Health Survey. The modification is implemented as an automatic remedy in the R package surveyPrev, which provides a comprehensive suite of tools for conducing SAE in LMICs.

Automatic Variance Adjustment for Small Area Estimation

TL;DR

This work tackles the instability and non-existence of variance estimates in small area estimation under DHS-like stratified two-stage sampling in LMICs. It introduces Automatic Variance Adjustment via phantom prior augmentation, extending from simple random sampling to complex survey designs and exponential-family settings, and implements it in surveyPrev to enable automated, design-consistent variance fixes. Through simulations modeled on Zambia DHS data and a real application to wasting in children, the approach improves interval coverage and precision in Admin-2 estimates while enabling reliable ranking and spatial borrowing via BYM2 structures. The method offers a practical, scalable solution for producing trustworthy prevalence maps and domain-level inferences in data-sparse settings, with broad applicability across LMIC health indicators.

Abstract

Small area estimation (SAE) is a common endeavor and is used in a variety of disciplines. In low- and middle-income countries (LMICs), in which household surveys provide the most reliable and timely source of data, SAE is vital for highlighting disparities in health and demographic indicators. Weighted estimators are ideal for inference, but for fine geographical partitions in which there are insufficient data, SAE models are required. The most common approach is Fay-Herriot area-level modeling in which the data requirements are a weighted estimate and an associated variance estimate. The latter can be undefined or unstable when data are sparse and so we propose a principled modification which is based on augmenting the available data with a prior sample from a hypothetical survey. This adjustment is generally available, respects the design and is simple to implement. We examine the empirical properties of the adjustment through simulation and illustrate its use with wasting data from a 2018 Zambian Demographic and Health Survey. The modification is implemented as an automatic remedy in the R package surveyPrev, which provides a comprehensive suite of tools for conducing SAE in LMICs.
Paper Structure (22 sections, 69 equations, 16 figures, 4 tables)

This paper contains 22 sections, 69 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Locations of sampled urban and rural clusters in the 2018 Zambia DHS (jittered to preserve privacy) with Admin-1 and Admin-2 boundaries indicated, along with Admin-1 labels.
  • Figure 2: Admin-1 level wasting prevalence estimate (left) and coefficient of variation (right) in Zambia, based on DHS 2018 survey.
  • Figure 3: Empirical coverage across Admin-2 (unplanned) areas, under the nominal 80% level (red dashed line), using asymptotic normal sampling distribution of the estimator. Annotations show, within each planned domain, the percentage of unplanned domains with illegal direct variance estimates.
  • Figure 4: Interval scores (with 80% coverage) for Admin-2 domains for the three variance-handling strategies, by Admin-1 domains.
  • Figure 5: Prevalence estimates and 95% uncertainty interval width for four approaches. The red borders and numbers indicate areas with illegal variance estimates. Information on these areas is presented in Table \ref{['tab:illegal']}. In the left column, and in the second row of the right column, areas in gray have no clusters. In the top right map, the gray areas have no clusters or undefined/zero variance estimates.
  • ...and 11 more figures