A Bayesian approach to differential prevalence analysis with applications in microbiome studies
Juho Pelto, Kari Auranen, Janne V. Kujala, Leo Lahti
TL;DR
This study addresses differential prevalence analysis (DPA) for microbiome presence/absence data, highlighting boundary-case and multiplicity challenges in traditional methods. It introduces DiPPER, a Bayesian hierarchical model that borrows information across features using a shared asymmetric Laplace prior on the log-odds differences $\beta_j$, with covariates and sequencing depth accounted for in a logistic regression framework. Posterior inference is obtained via No-U-Turn Sampling in Stan, yielding multiplicity-adjusted uncertainty intervals and finite estimates even in boundary cases. On 80 original datasets from 67 gut microbiome studies, DiPPER shows high sensitivity and strong cross-study replication relative to frequentist DPA and DAA methods, while providing interpretable differential prevalence estimates and scalable uncertainty; robustness to hyperpriors and potential extensions to differential abundance analysis are discussed. Practical implications include more reliable detection of disease-associated presence/absence signals and reduced reliance on p-value corrections, with potential applicability to other omics domains.
Abstract
Recent evidence suggests that analyzing the presence/absence of taxonomic features can offer a compelling alternative to differential abundance analysis in microbiome studies. However, standard approaches face challenges with boundary cases and multiple testing. To address these challenges, we developed DiPPER (Differential Prevalence via Probabilistic Estimation in R), a method based on Bayesian hierarchical modeling. We benchmarked our method against existing differential prevalence and abundance methods using data from 67 publicly available human gut microbiome studies. We observed considerable variation in performance across methods, with DiPPER outperforming alternatives by combining high sensitivity with effective error control. DiPPER also demonstrated superior replication of findings across independent studies. Furthermore, DiPPER provides differential prevalence estimates and uncertainty intervals that are inherently adjusted for multiple testing.
