High-Dimensional Mediation Analysis for Generalized Linear Models Using Bayesian Variable Selection Guided by Mediator Correlation
Youngho Bae, Chanmin Kim, Fenglei Wang, Qi Sun, Kyu Ha Lee
TL;DR
The paper tackles high-dimensional mediation analysis with non-Gaussian outcomes by developing a Bayesian framework that incorporates mediator correlation through a Markov random field (MRF) prior and a sequential subsetting Bernoulli (SSB) prior to identify active pathways. Mediators are modeled jointly with a factor-analytic covariance $\Sigma=\sigma_\Sigma^2(\boldsymbol{\lambda}\boldsymbol{\lambda}^\top+I_q)$, while the outcome is modeled with a generalized linear model, enabling non-Gaussian likelihoods; posterior computation uses Metropolis–Hastings updates for the outcome parameters and Hamiltonian refinement for efficient exploration. Target estimands are defined in a causal-odds framework with $\log(OR^{TE}_{a,a^\star})=\alpha+\sum_{j=1}^{q} \tau_j\delta_j$, allowing decomposition into natural direct and indirect effects $\tau_j\delta_j$. Simulations show superior pathway recovery and stable error control in correlated settings, and an HPFS/NHSII metabolomics application identifies plausible mediator candidates, including Cer(d18:1/22:0), with a parsimonious mediator set compared to methods ignoring mediator dependence. The work provides coherent uncertainty quantification and scalable computation, enabling correlation-aware mediation analysis in GLMs with high-dimensional mediators.
Abstract
High-dimensional mediation analysis aims to identify mediating pathways and to estimate indirect effects linking an exposure to an outcome. In this paper, we propose a Bayesian framework to address key challenges in these analyses, including high dimensionality, complex dependence among omics mediators, and non-continuous outcomes. Furthermore, commonly used approaches assume independent mediators or ignore correlations in the selection stage, which can reduce power when mediators are highly correlated. Addressing these challenges leads to a non-Gaussian likelihood and specialized selection priors, which in turn require efficient and adaptive posterior computation. Our proposed framework selects active pathways under generalized linear models while accounting for mediator dependence. Specifically, the mediators are modeled using a multivariate distribution, exposure-mediator selection is guided by a Markov random field prior on inclusion indicators, and mediator-outcome activation is restricted to mediators supported in the exposure-mediator model through a sequential subsetting Bernoulli prior. Simulation studies show improved operating characteristics in correlated-mediator settings, with appropriate error control under the global null and stable performance under model misspecification. We illustrate the method using real-world metabolomics data to study metabolites that mediate the association between adherence to the Alternate Mediterranean Diet score and two cardiometabolic outcomes.
