Table of Contents
Fetching ...

A Bayesian Hierarchical Hurdle Beta-Binomial Model for Survey-Weighted Bounded Counts and Its Application to Childcare Enrollment

JoonHo Lee

Abstract

Bounded discrete proportions -- counts out of known totals -- present modeling challenges when data exhibit structural zeros, overdispersion, and hierarchical clustering. We develop a Bayesian hierarchical hurdle beta-binomial model with state-varying coefficients that addresses all four features. The framework makes three methodological contributions: (i) it studies cross-margin dependence via a cross-block covariance component and clarifies when and how this parameter is identified through the hierarchical layer rather than the conditional likelihood; (ii) it proposes a Cholesky-based sandwich variance calibration for pseudo-posterior inference under survey weights, guided by a parameter-specific design effect ratio diagnostic; and (iii) it introduces a log-scale marginal effect decomposition for hurdle models that translates regression coefficients into policy-relevant quantities. Applied to 6,785 childcare providers across 51 states from the 2019 National Survey of Early Care and Education, the model reveals a "poverty reversal": poverty reduces enrollment participation yet increases intensity among participants, with the extensive margin accounting for two-thirds of the total effect. Design-calibrated simulation shows that sandwich-corrected intervals substantially improve coverage, reaching 82--88.5% at the 90% nominal level for fixed effects. The R package hurdlebb implements all methods.

A Bayesian Hierarchical Hurdle Beta-Binomial Model for Survey-Weighted Bounded Counts and Its Application to Childcare Enrollment

Abstract

Bounded discrete proportions -- counts out of known totals -- present modeling challenges when data exhibit structural zeros, overdispersion, and hierarchical clustering. We develop a Bayesian hierarchical hurdle beta-binomial model with state-varying coefficients that addresses all four features. The framework makes three methodological contributions: (i) it studies cross-margin dependence via a cross-block covariance component and clarifies when and how this parameter is identified through the hierarchical layer rather than the conditional likelihood; (ii) it proposes a Cholesky-based sandwich variance calibration for pseudo-posterior inference under survey weights, guided by a parameter-specific design effect ratio diagnostic; and (iii) it introduces a log-scale marginal effect decomposition for hurdle models that translates regression coefficients into policy-relevant quantities. Applied to 6,785 childcare providers across 51 states from the 2019 National Survey of Early Care and Education, the model reveals a "poverty reversal": poverty reduces enrollment participation yet increases intensity among participants, with the extensive margin accounting for two-thirds of the total effect. Design-calibrated simulation shows that sandwich-corrected intervals substantially improve coverage, reaching 82--88.5% at the 90% nominal level for fixed effects. The R package hurdlebb implements all methods.
Paper Structure (175 sections, 54 theorems, 144 equations, 13 figures, 28 tables, 1 algorithm)

This paper contains 175 sections, 54 theorems, 144 equations, 13 figures, 28 tables, 1 algorithm.

Key Result

Theorem 1

For all $n \ge 2$, $\kappa > 0$, and $\mu \in (0,1)$, For $n = 1$, $h(\mu) \equiv 1$ for all $\mu \in (0,1)$.

Figures (13)

  • Figure 1: Distribution of the infant/toddler enrollment share $Y_i/n_i$ across $N = 6{,}785$ center-based providers (NSECE 2019). The spike at zero (35.3% of centers) represents providers that do not serve any children under age 3. Among IT-serving providers ($z_i = 1$, $N_{\mathrm{pos}} = 4{,}392$), the distribution is roughly unimodal with a mean of 0.478 and substantial dispersion (SD $= 0.214$), consistent with overdispersion approximately twelve times beyond the binomial---three features that jointly motivate the hurdle beta-binomial model.
  • Figure 2: Preliminary evidence for the poverty reversal in the raw NSECE 2019 data. Left panel: IT participation rate (fraction of centers serving any infants) by community poverty decile; a loess smoother with 95% confidence band confirms the declining trend. Right panel: mean IT enrollment share ($Y_i/n_i$) among servers by poverty decile, showing the opposing positive trend. The two panels together display the reversal in its simplest form: participation falls with poverty, but intensity rises.
  • Figure 3: Empirical coverage of 90% intervals across $R = 200$ replications. Each panel corresponds to a simulation scenario (S0, S3, S4 with increasing Kish $\text{DEFF}$). Points show coverage rates for each parameter--estimator combination; the horizontal dashed line marks the 90% nominal level; the shaded band indicates $\pm\,2$ MCSE ($\approx 4.2$ percentage points). E-UW (unweighted) maintains near-nominal coverage throughout. E-WT (weighted, naive) collapses as $\text{DEFF}$ increases. E-WS (sandwich-corrected) provides meaningful recovery for fixed effects but cannot correct the hyperparameters $\tau_{\mathrm{ext}}$ and $\tau_{\mathrm{int}}$.
  • Figure 4: Impact of the sandwich variance correction on fixed-effect inference. Left panel: naive MCMC 95% credible intervals (gray) versus sandwich-corrected Wald 95% confidence intervals (colored) for all 11 fixed effects. The naive intervals reflect prior width; the Wald intervals reflect data information adjusted for the survey design. Right panel: design effect ratio ($\operatorname{DER}$) for each parameter. A $\operatorname{DER}$ of 1 indicates no survey design effect; the observed range is 1.14--4.18 (mean 2.11).
  • Figure 5: Cross-margin scatter plot of state-specific poverty coefficients from the unweighted M2 model (block-diagonal SVC). Each point represents the posterior mean $(\tilde{\alpha}_{\mathrm{pov},s},\;\tilde{\beta}_{\mathrm{pov},s})$ for one state. Small points without labels denote states with $N < 40$. The four quadrants correspond to different poverty--enrollment patterns: the upper-left quadrant (shaded) contains 48 of 51 states (94%), indicating a near-universal reversal pattern under hierarchical shrinkage. The dashed lines indicate the population-average coefficients $(\alpha_{\mathrm{pov}},\;\beta_{\mathrm{pov}})$. The near-zero cross-margin correlation ($\hat{\varrho} \approx 0.021$) implies that the two margins respond to poverty through largely independent mechanisms.
  • ...and 8 more figures

Theorems & Definitions (149)

  • Definition 1: Hurdle beta-binomial model
  • Theorem 1: Monotonicity of $h$
  • proof : Proof sketch
  • Proposition 1: Elasticity of $h$
  • Proposition 2: Non-identification of $\boldsymbol{\Sigma}_{12}$ from the conditional likelihood
  • proof
  • Remark 1: Conditional versus marginal likelihood
  • Proposition 3: Identifiability of $\boldsymbol{\Sigma}_{12}$ via the prior
  • Remark 2: Boundary condition
  • Definition 2: Policy moderator specification
  • ...and 139 more