Table of Contents
Fetching ...

A Bayesian Functional Concurrent Zero-Inflated Dirichlet-Multinomial Regression Model with Application to Infant Microbiome

Brody Erlandson, Ander Wilson, Matthew D. Koslovsky

Abstract

The infant microbiome undergoes rapid changes in composition over time and is associated with long-term risks of conditions such as immune strength, allergy, asthma, and other health outcomes. Modeling the associations between exposures or treatments and microbial composition over time is essential for understanding the factors that drive these changes. Estimating these temporal dynamics has several challenges including: repeated measures, overdispersion, compositionality, high-dimensional parameter spaces, and zero-inflation. Many longitudinal regression models used in human microbiome research assume constant effects over time that cannot capture time-varying or functional effects of exposures, ignore the compositional structure of the data by modeling each taxon separately, and are not equipped to handle potential zero-inflation. Dirichlet-multinomial (DM) regression models inherently accommodate overdispersion and the compositional structure of the data and have been extended to account for excess zeros. However, existing DM-based regression models are unable to additionally handle repeated measures designs. To fill this gap, we propose a functional concurrent zero-inflated Dirichlet-multinomial (FunC-ZIDM) regression model which is designed to model time-varying relations between observed covariates and microbial taxa while accounting for zero-inflation, compositionality, and repeated measures. Through simulation, we demonstrate that the model can accurately estimate the underlying functional relations and scale to large compositional spaces. We apply our model to investigate time-varying associations between infant microbiome composition and observed covariates during the 11-week postnatal period. We found that $α$-diversity (i.e., diversity of the microbiome within an individual) is positively associated with a higher gestational age and percentage of breast milk in the diet.

A Bayesian Functional Concurrent Zero-Inflated Dirichlet-Multinomial Regression Model with Application to Infant Microbiome

Abstract

The infant microbiome undergoes rapid changes in composition over time and is associated with long-term risks of conditions such as immune strength, allergy, asthma, and other health outcomes. Modeling the associations between exposures or treatments and microbial composition over time is essential for understanding the factors that drive these changes. Estimating these temporal dynamics has several challenges including: repeated measures, overdispersion, compositionality, high-dimensional parameter spaces, and zero-inflation. Many longitudinal regression models used in human microbiome research assume constant effects over time that cannot capture time-varying or functional effects of exposures, ignore the compositional structure of the data by modeling each taxon separately, and are not equipped to handle potential zero-inflation. Dirichlet-multinomial (DM) regression models inherently accommodate overdispersion and the compositional structure of the data and have been extended to account for excess zeros. However, existing DM-based regression models are unable to additionally handle repeated measures designs. To fill this gap, we propose a functional concurrent zero-inflated Dirichlet-multinomial (FunC-ZIDM) regression model which is designed to model time-varying relations between observed covariates and microbial taxa while accounting for zero-inflation, compositionality, and repeated measures. Through simulation, we demonstrate that the model can accurately estimate the underlying functional relations and scale to large compositional spaces. We apply our model to investigate time-varying associations between infant microbiome composition and observed covariates during the 11-week postnatal period. We found that -diversity (i.e., diversity of the microbiome within an individual) is positively associated with a higher gestational age and percentage of breast milk in the diet.

Paper Structure

This paper contains 13 sections, 9 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Observed relative abundance over time (days) for individual infants and averaged over gestational age at birth. Each plot on the left-hand side represents an individual infant's observed microbial composition on the day sampled from la2014patterned, where the stacked bar represents the taxa relative abundances observed from that sample. The right-hand side splits gestational age at birth into three groups, and the daily average compositions are plotted as a stacked bar.
  • Figure 2: Estimated difference in relative abundance of Bacilli, Clostridia, and Gammaproteobacteria over time as baseline covariate values ($\mathbf{x}(t)=\mathbf{0}$). The figure shows the posterior mean (solid blue line) and 0.95 probability credible intervals (dashed black lines). Bacilli, Clostridia, and Gammaproteobacteria were the three most abundant taxa at baseline values of the covariates.
  • Figure 3: The estimated mean multiplicative difference in relative abundance, $\Delta_{v} \text{RA}_{jp}[t, \mathbf{x}(t)]$, and $\alpha$-diversity, $\Delta_{v} \alpha_{p}[t, \mathbf{x}(t)]$, with a $v$-weak difference in gestational age at birth for Clostridia and Gammaproteobacteria. The left plot shows the $\Delta_{v} \text{RA}_{jp}[t, \mathbf{x}(t)]$ for Clostridia and Gammaproteobacteria, and the right plot shows the $\Delta_{v} \alpha_{p}[t, \mathbf{x}(t)]$ with $l=0.75$.
  • Figure 4: The estimated multiplicative difference in relative abundance, $\Delta_{v} \text{RA}_{jp}[t, \mathbf{x}(t)]$, for infants on a diet of 10-50% breast milk compared to $<$$10\%$ breast milk for the three most abundant taxa (left plot). For reference, we provide the estimated functional coefficients, $\beta_{jp}(t)$, for infants on a diet of 10-50% breast milk (right plot).
  • Figure 5: Results from simulation scenario 1 demonstrating performance as a function of zero-inflation level. The figure shows Gaussian kernel moving averages with a bandwidth of 750 across levels of zero-inflation for the active covariates. Each line represents a different model.
  • ...and 1 more figures