Table of Contents
Fetching ...

Design Effect Ratios for Bayesian Survey Models: A Diagnostic Framework for Identifying Survey-Sensitive Parameters

JoonHo Lee

Abstract

Bayesian hierarchical models fit to complex survey data require variance correction for the sampling design, yet applying this correction uniformly harms parameters already protected by the hierarchical structure. We propose the Design Effect Ratio -- the ratio of design-corrected to model-based posterior variance -- as a per-parameter diagnostic identifying which quantities are survey-sensitive. Closed-form decompositions show that fixed-effect sensitivity depends on whether identifying variation lies between or within clusters, while random-effect sensitivity is governed by hierarchical shrinkage. These results yield a compute-classify-correct workflow adding negligible overhead to Bayesian estimation. In simulations spanning 54 scenarios and 10,800 replications of hierarchical logistic regression, selective correction achieves 87-88% coverage for survey-sensitive parameters -- matching blanket correction -- while preserving near-nominal coverage for protected parameters that blanket correction collapses to 20-21%. A threshold of 1.2 produces zero false positives, with a separation ratio of approximately 4:1. Applied to the 2019 National Survey of Early Care and Education (6,785 providers, 51 states), the diagnostic flags exactly 1 of 54 parameters for correction; blanket correction would have narrowed the worst remaining interval to 4.3% of its original width. The entire pipeline completes in under 0.03 seconds, bridging design-based and model-based survey inference.

Design Effect Ratios for Bayesian Survey Models: A Diagnostic Framework for Identifying Survey-Sensitive Parameters

Abstract

Bayesian hierarchical models fit to complex survey data require variance correction for the sampling design, yet applying this correction uniformly harms parameters already protected by the hierarchical structure. We propose the Design Effect Ratio -- the ratio of design-corrected to model-based posterior variance -- as a per-parameter diagnostic identifying which quantities are survey-sensitive. Closed-form decompositions show that fixed-effect sensitivity depends on whether identifying variation lies between or within clusters, while random-effect sensitivity is governed by hierarchical shrinkage. These results yield a compute-classify-correct workflow adding negligible overhead to Bayesian estimation. In simulations spanning 54 scenarios and 10,800 replications of hierarchical logistic regression, selective correction achieves 87-88% coverage for survey-sensitive parameters -- matching blanket correction -- while preserving near-nominal coverage for protected parameters that blanket correction collapses to 20-21%. A threshold of 1.2 produces zero false positives, with a separation ratio of approximately 4:1. Applied to the 2019 National Survey of Early Care and Education (6,785 providers, 51 states), the diagnostic flags exactly 1 of 54 parameters for correction; blanket correction would have narrowed the worst remaining interval to 4.3% of its original width. The entire pipeline completes in under 0.03 seconds, bridging design-based and model-based survey inference.
Paper Structure (208 sections, 14 theorems, 109 equations, 6 figures, 17 tables, 1 algorithm)

This paper contains 208 sections, 14 theorems, 109 equations, 6 figures, 17 tables, 1 algorithm.

Key Result

Theorem 1

Under conditions (A1)--(A5) and (R1)--(R5), for the $k$th fixed-effect coefficient $\beta_k$, where $R_k \in [0, B]$ measures the fraction of identifying variation for covariate $k$ that comes from between-group differences, and $B = \sigma^2_\theta / (\sigma^2_\theta + \sigma^2_e / n)$ is the common shrinkage factor eq:shrinkage. In particular:

Figures (6)

  • Figure 1: DER decomposition as a function of shrinkage factor $B$. Left: Random-effect DER (\ref{['thm:random-effect']}) exhibits an inverted-U shape with peak at $B^* = \sqrt{J}/(\sqrt{J}+1)$; dashed lines show the large-$J$ limit $B \cdot \mathrm{DEFF}$. Right: Fixed-effect DER (\ref{['thm:fixed-effect']}) spans the band $[\mathrm{DEFF}(1-B),\, \mathrm{DEFF}]$, with within-group covariates at the upper boundary and between-group covariates at the lower boundary. Horizontal gray lines mark the classification thresholds $\tau = 1.2$ and $\tau = 1.5$.
  • Figure 2: Formula accuracy across 54 simulation scenarios. (a) General formula (Laplace approximation, \ref{['prop:nonconjugate']}) versus exact $\mathrm{DER}$ computed via \ref{['alg:ccc']}, showing tight agreement along the diagonal with median absolute relative error of $2.3\%$. (b) Simplified formula predictions (\ref{['thm:fixed-effect', 'thm:random-effect']}) versus exact $\mathrm{DER}$, showing correct qualitative ordering but quantitative divergence for between-cluster fixed effects and random effects. Within-cluster fixed-effect points (triangles) lie on the diagonal in both panels. Log--log scale; each point represents one parameter in one scenario (6 scenarios $\times$ 3 replicates).
  • Figure 3: Coverage of 90% credible intervals across four correction strategies and three design effect levels ($\mathrm{DEFF} \approx 1.09,\, 2.00,\, 5.00$). (a) Target parameters (within-cluster fixed effects): the naive strategy (black, dashed) drops from ${\sim}88\%$ at $\mathrm{DEFF} = 1.09$ to ${\sim}65\%$ at $\mathrm{DEFF} = 5.0$; DER-guided correction (blue, solid; $\tau = 1.2$) and blanket correction (red, dotted) both restore coverage to ${\sim}87\%$. (b) Non-target parameters: blanket correction reduces coverage to ${\sim}20$% for both between-cluster fixed effects and random effects. DER-guided correction preserves near-nominal coverage for between-cluster fixed effects ($89.7$--$90.3$%) and somewhat lower coverage for random effects ($80.8$--$85.3$%); the panel averages over both classes. Points are averaged over ICC levels and replications.
  • Figure 4: DER profile for a representative scenario ($J = 100$, $\mathrm{CV}_w = 1.0$, $\mathrm{ICC} = 0.30$, non-informative; $\mathrm{DEFF} \approx 2.0$). The within-cluster fixed effect ($\beta_1$, blue diamond) has $\mathrm{DER} \approx 1.75$, well above the threshold $\tau = 1.2$ (vertical dashed line). Between-cluster fixed effects ($\beta_0$, $\beta_2$; green squares) and 100 random effects ($\theta_1, \ldots, \theta_{100}$; orange cloud) all have $\mathrm{DER} < 0.1$. The empty space between the non-target parameters and the within-cluster parameter illustrates the separation gap that makes $\tau = 1.2$ robust.
  • Figure 5: $\mathrm{DER}$ diagnostic profile for the NSECE 2019 hierarchical logistic regression (54 parameters). (a) Fixed-effect $\mathrm{DER}$ values: the within-state poverty coefficient ($\beta_1$, $\mathrm{DER} = 2.643$) is the sole parameter exceeding the threshold $\tau = 1.2$; the intercept and tiered reimbursement coefficient both have $\mathrm{DER} \approx 0.03$. (b) Random-effect $\mathrm{DER}$ values for 51 states, colored by shrinkage factor $B_j$: all values lie below 1, confirming Tier II classification. (c) 90% credible intervals for the three fixed effects before (naive, gray) and after (selective, blue) $\mathrm{DER}$-guided correction: only $\beta_1$ is adjusted ($+62.6$% width increase); intervals for $\beta_0$ and $\beta_2$ are unchanged by design.
  • ...and 1 more figures

Theorems & Definitions (29)

  • Definition 1: Design Effect Ratio
  • Theorem 1: Fixed-Effect DER
  • proof : Proof sketch
  • Theorem 2: Random-Effect DER
  • proof : Proof sketch
  • Corollary 1: Boundary Behavior
  • Corollary 2: Conservation Law
  • Proposition 1: Non-Conjugate Approximation
  • Remark 1: Hyperparameters
  • Remark 2: Reparameterization
  • ...and 19 more