Table of Contents
Fetching ...

High-Dimensional Mediation Analysis for Generalized Linear Models Using Bayesian Variable Selection Guided by Mediator Correlation

Youngho Bae, Chanmin Kim, Fenglei Wang, Qi Sun, Kyu Ha Lee

TL;DR

The paper tackles high-dimensional mediation analysis with non-Gaussian outcomes by developing a Bayesian framework that incorporates mediator correlation through a Markov random field (MRF) prior and a sequential subsetting Bernoulli (SSB) prior to identify active pathways. Mediators are modeled jointly with a factor-analytic covariance $\Sigma=\sigma_\Sigma^2(\boldsymbol{\lambda}\boldsymbol{\lambda}^\top+I_q)$, while the outcome is modeled with a generalized linear model, enabling non-Gaussian likelihoods; posterior computation uses Metropolis–Hastings updates for the outcome parameters and Hamiltonian refinement for efficient exploration. Target estimands are defined in a causal-odds framework with $\log(OR^{TE}_{a,a^\star})=\alpha+\sum_{j=1}^{q} \tau_j\delta_j$, allowing decomposition into natural direct and indirect effects $\tau_j\delta_j$. Simulations show superior pathway recovery and stable error control in correlated settings, and an HPFS/NHSII metabolomics application identifies plausible mediator candidates, including Cer(d18:1/22:0), with a parsimonious mediator set compared to methods ignoring mediator dependence. The work provides coherent uncertainty quantification and scalable computation, enabling correlation-aware mediation analysis in GLMs with high-dimensional mediators.

Abstract

High-dimensional mediation analysis aims to identify mediating pathways and to estimate indirect effects linking an exposure to an outcome. In this paper, we propose a Bayesian framework to address key challenges in these analyses, including high dimensionality, complex dependence among omics mediators, and non-continuous outcomes. Furthermore, commonly used approaches assume independent mediators or ignore correlations in the selection stage, which can reduce power when mediators are highly correlated. Addressing these challenges leads to a non-Gaussian likelihood and specialized selection priors, which in turn require efficient and adaptive posterior computation. Our proposed framework selects active pathways under generalized linear models while accounting for mediator dependence. Specifically, the mediators are modeled using a multivariate distribution, exposure-mediator selection is guided by a Markov random field prior on inclusion indicators, and mediator-outcome activation is restricted to mediators supported in the exposure-mediator model through a sequential subsetting Bernoulli prior. Simulation studies show improved operating characteristics in correlated-mediator settings, with appropriate error control under the global null and stable performance under model misspecification. We illustrate the method using real-world metabolomics data to study metabolites that mediate the association between adherence to the Alternate Mediterranean Diet score and two cardiometabolic outcomes.

High-Dimensional Mediation Analysis for Generalized Linear Models Using Bayesian Variable Selection Guided by Mediator Correlation

TL;DR

The paper tackles high-dimensional mediation analysis with non-Gaussian outcomes by developing a Bayesian framework that incorporates mediator correlation through a Markov random field (MRF) prior and a sequential subsetting Bernoulli (SSB) prior to identify active pathways. Mediators are modeled jointly with a factor-analytic covariance , while the outcome is modeled with a generalized linear model, enabling non-Gaussian likelihoods; posterior computation uses Metropolis–Hastings updates for the outcome parameters and Hamiltonian refinement for efficient exploration. Target estimands are defined in a causal-odds framework with , allowing decomposition into natural direct and indirect effects . Simulations show superior pathway recovery and stable error control in correlated settings, and an HPFS/NHSII metabolomics application identifies plausible mediator candidates, including Cer(d18:1/22:0), with a parsimonious mediator set compared to methods ignoring mediator dependence. The work provides coherent uncertainty quantification and scalable computation, enabling correlation-aware mediation analysis in GLMs with high-dimensional mediators.

Abstract

High-dimensional mediation analysis aims to identify mediating pathways and to estimate indirect effects linking an exposure to an outcome. In this paper, we propose a Bayesian framework to address key challenges in these analyses, including high dimensionality, complex dependence among omics mediators, and non-continuous outcomes. Furthermore, commonly used approaches assume independent mediators or ignore correlations in the selection stage, which can reduce power when mediators are highly correlated. Addressing these challenges leads to a non-Gaussian likelihood and specialized selection priors, which in turn require efficient and adaptive posterior computation. Our proposed framework selects active pathways under generalized linear models while accounting for mediator dependence. Specifically, the mediators are modeled using a multivariate distribution, exposure-mediator selection is guided by a Markov random field prior on inclusion indicators, and mediator-outcome activation is restricted to mediators supported in the exposure-mediator model through a sequential subsetting Bernoulli prior. Simulation studies show improved operating characteristics in correlated-mediator settings, with appropriate error control under the global null and stable performance under model misspecification. We illustrate the method using real-world metabolomics data to study metabolites that mediate the association between adherence to the Alternate Mediterranean Diet score and two cardiometabolic outcomes.
Paper Structure (16 sections, 8 equations, 2 figures, 3 tables)

This paper contains 16 sections, 8 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Histogram of pairwise correlations $r_{ij}$ among the 298 candidate metabolites in the HPFS/NHSII application $(1 \le i < j \le 298)$. The mean absolute correlation among metabolite pairs is 0.157, indicating substantial dependence among mediators and motivating correlation-guided pathway selection. Moreover, 13.4% of metabolite pairs have $|r_{ij}| \ge 0.3$ and 5.7% have $|r_{ij}| \ge 0.5$, indicating a non-negligible tail of strong correlations.
  • Figure 2: True directed acyclic graph (DAG) illustrating $q$-dimensional mediators (A) and the estimated DAG based on the proposed subsetting prior (B). Solid lines represent active pathways (i.e., those with a true causal effect). A causal effect is transmitted through a mediator only when both the (a) exposure–mediator and (b) mediator–outcome pathways are active, as indicated by thick solid lines. Under the subsetting prior, the activity of mediator–outcome pathways is evaluated only for mediators with active exposure–mediator connections, resulting in identical estimated active pathways.