Table of Contents
Fetching ...

A Bayesian prevalence-incidence mixture model for screening outcomes with misclassification

Thomas Klausch, Birgit I. Lissenberg-Witte, Veerle M. Coupé

TL;DR

BayesPIM addresses the challenge of estimating time-to-incidence from screening data when baseline disease may be prevalent and tests are imperfect. It extends prevalence-incidence mixture modeling by embedding an Accelerated Failure Time incidence process and a probit prevalence process within a Bayesian data-augmentation framework that accounts for misclassification via the test sensitivity $\kappa$. The approach uses a Metropolis-within-Gibbs sampler with latent $t_i$ and $g_i$, enabling covariate-driven inference and producing posterior predictive CIFs that combine prevalence and incident risk; model fit is assessed with WAIC and a non-parametric CIF estimator adapted for prevalence. Applied to Dutch CRC EHR, BayesPIM reveals substantial pre-existing prevalence and heterogeneity in adenoma risk across age and gender, demonstrates improved CIF estimation over assuming perfect sensitivity, and provides guidance on informative priors for $\kappa$ to achieve stable, interpretable results with potential to inform personalized screening strategies.

Abstract

We present BayesPIM, a Bayesian prevalence-incidence mixture model for estimating time- and covariate-dependent disease incidence from screening and surveillance data. The method is particularly suited to settings where some individuals may have the disease at baseline, baseline tests may be missing or incomplete, and the screening test has imperfect test sensitivity. This setting was present in data from high-risk colorectal cancer (CRC) surveillance through colonoscopy, where adenomas, precursors of CRC, were already present at baseline and remained undetected due to imperfect test sensitivity. By including covariates, the model can quantify heterogeneity in disease risk, thereby informing personalized screening strategies. Internally, BayesPIM uses a Metropolis-within-Gibbs sampler with data augmentation and weakly informative priors on the incidence and prevalence model parameters. In simulations based on the real-world CRC surveillance data, we show that BayesPIM estimates model parameters without bias while handling latent prevalence and imperfect test sensitivity. However, informative priors on the test sensitivity are needed to stabilize estimation and mitigate non-convergence issues. We also show how conditioning incidence and prevalence estimates on covariates explains heterogeneity in adenoma risk and how model fit is assessed using information criteria and a non-parametric estimator.

A Bayesian prevalence-incidence mixture model for screening outcomes with misclassification

TL;DR

BayesPIM addresses the challenge of estimating time-to-incidence from screening data when baseline disease may be prevalent and tests are imperfect. It extends prevalence-incidence mixture modeling by embedding an Accelerated Failure Time incidence process and a probit prevalence process within a Bayesian data-augmentation framework that accounts for misclassification via the test sensitivity . The approach uses a Metropolis-within-Gibbs sampler with latent and , enabling covariate-driven inference and producing posterior predictive CIFs that combine prevalence and incident risk; model fit is assessed with WAIC and a non-parametric CIF estimator adapted for prevalence. Applied to Dutch CRC EHR, BayesPIM reveals substantial pre-existing prevalence and heterogeneity in adenoma risk across age and gender, demonstrates improved CIF estimation over assuming perfect sensitivity, and provides guidance on informative priors for to achieve stable, interpretable results with potential to inform personalized screening strategies.

Abstract

We present BayesPIM, a Bayesian prevalence-incidence mixture model for estimating time- and covariate-dependent disease incidence from screening and surveillance data. The method is particularly suited to settings where some individuals may have the disease at baseline, baseline tests may be missing or incomplete, and the screening test has imperfect test sensitivity. This setting was present in data from high-risk colorectal cancer (CRC) surveillance through colonoscopy, where adenomas, precursors of CRC, were already present at baseline and remained undetected due to imperfect test sensitivity. By including covariates, the model can quantify heterogeneity in disease risk, thereby informing personalized screening strategies. Internally, BayesPIM uses a Metropolis-within-Gibbs sampler with data augmentation and weakly informative priors on the incidence and prevalence model parameters. In simulations based on the real-world CRC surveillance data, we show that BayesPIM estimates model parameters without bias while handling latent prevalence and imperfect test sensitivity. However, informative priors on the test sensitivity are needed to stabilize estimation and mitigate non-convergence issues. We also show how conditioning incidence and prevalence estimates on covariates explains heterogeneity in adenoma risk and how model fit is assessed using information criteria and a non-parametric estimator.

Paper Structure

This paper contains 47 sections, 93 equations, 28 figures, 5 tables.

Figures (28)

  • Figure 1: DAG illustrating the hierarchical model structure of BayesPIM using plate notation. White circles denote unobserved variables, grey circles denote observed variables, dots denote fixed parameters, and arrows denote the direction of dependence. For clarity, dependence of $v_{ij}$ on history $\bar{\bm{v}}_{ij}$ is suppressed and $v_{i1}=0$ (baseline time) is implied.
  • Figure 2: Monte Carlo (estimation) errors of the estimands: marginal prevalence probability ($\Pr(g_i=1)$, denoted "prev") and the test sensitivity ($\kappa$, denoted "sens"). The first two rows give errors for the prevalence and the second two rows for the sensitivity. The priors on the test sensitivity $\kappa$ are either uninformative (uninf.), informative (inf.) or fixed at the true value (point).
  • Figure 3: Posterior median estimates of the marginal mixture CIF $F_{t^*}(t \mid \mathop{\mathrm{\bm{\beta}}}\nolimits, \sigma, \mathop{\mathrm{\bm{\theta}}}\nolimits)$, point-wise averaged over 200 Monte Carlo simulation runs with 95% quantiles shown as shaded regions. The condition $\Pr(r_i=1)= 1$ is shown (for $\Pr(r_i=1)= 0$ see Supplemental Figure \ref{['fig:sim1_cdfs_prob_r0']}). For em_mixed, $\kappa$ was set to its true value. The lines of all models except PIMixture are overlapping.
  • Figure 4: Monte Carlo (estimation) error of the marginal prevalence probability ($\Pr(g_i=1)$, denoted "prev") and the test sensitivity ($\kappa$, denoted "sens") estimands. The first two columns give errors for the prevalence and the second two columns for the sensitivity. The priors on the test sensitivity $\kappa$ are either uninformative (uninf.), informative (inf.) or fixed at the true value (point).
  • Figure 5: Marginal mixture CIFs, $F_{t^*}(t \mid \mathop{\mathrm{\bm{\beta}}}\nolimits, \sigma, \mathop{\mathrm{\bm{\theta}}}\nolimits)$, point-wise averaged over 200 Monte Carlo simulation runs with 95% quantiles shown as shaded regions (for clarity these bounds have been omitted for PIMixture and em_mixed). For BayesPIM the posterior median of the posterior predictive marginal mixture CIF is shown. For PIMixture and em_mixed the corresponding maximum likelihood estimate is shown. Lines of BayesPIM (point) and "Truth" are overlapping in most graphs.
  • ...and 23 more figures