A Bayesian prevalence-incidence mixture model for screening outcomes with misclassification
Thomas Klausch, Birgit I. Lissenberg-Witte, Veerle M. Coupé
TL;DR
BayesPIM addresses the challenge of estimating time-to-incidence from screening data when baseline disease may be prevalent and tests are imperfect. It extends prevalence-incidence mixture modeling by embedding an Accelerated Failure Time incidence process and a probit prevalence process within a Bayesian data-augmentation framework that accounts for misclassification via the test sensitivity $\kappa$. The approach uses a Metropolis-within-Gibbs sampler with latent $t_i$ and $g_i$, enabling covariate-driven inference and producing posterior predictive CIFs that combine prevalence and incident risk; model fit is assessed with WAIC and a non-parametric CIF estimator adapted for prevalence. Applied to Dutch CRC EHR, BayesPIM reveals substantial pre-existing prevalence and heterogeneity in adenoma risk across age and gender, demonstrates improved CIF estimation over assuming perfect sensitivity, and provides guidance on informative priors for $\kappa$ to achieve stable, interpretable results with potential to inform personalized screening strategies.
Abstract
We present BayesPIM, a Bayesian prevalence-incidence mixture model for estimating time- and covariate-dependent disease incidence from screening and surveillance data. The method is particularly suited to settings where some individuals may have the disease at baseline, baseline tests may be missing or incomplete, and the screening test has imperfect test sensitivity. This setting was present in data from high-risk colorectal cancer (CRC) surveillance through colonoscopy, where adenomas, precursors of CRC, were already present at baseline and remained undetected due to imperfect test sensitivity. By including covariates, the model can quantify heterogeneity in disease risk, thereby informing personalized screening strategies. Internally, BayesPIM uses a Metropolis-within-Gibbs sampler with data augmentation and weakly informative priors on the incidence and prevalence model parameters. In simulations based on the real-world CRC surveillance data, we show that BayesPIM estimates model parameters without bias while handling latent prevalence and imperfect test sensitivity. However, informative priors on the test sensitivity are needed to stabilize estimation and mitigate non-convergence issues. We also show how conditioning incidence and prevalence estimates on covariates explains heterogeneity in adenoma risk and how model fit is assessed using information criteria and a non-parametric estimator.
