Table of Contents
Fetching ...

High-dimensional Many-to-many-to-many Mediation Analysis

Tien Dat Nguyen, Trung Khang Tran, Cong Khanh Truong, Duy-Cat Can, Binh T. Nguyen, Oliver Y. Chén

Abstract

We study high-dimensional mediation analysis in which exposures, mediators, and outcomes are all multivariate, and both exposures and mediators may be high-dimensional. We formalize this as a many (exposures)-to-many (mediators)-to-many (outcomes) (MMM) mediation analysis problem. Methodologically, MMM mediation analysis simultaneously performs variable selection for high-dimensional exposures and mediators, estimates the indirect effect matrix (i.e., the coefficient matrices linking exposure-to-mediator and mediator-to-outcome pathways), and enables prediction of multivariate outcomes. Theoretically, we show that the estimated indirect effect matrices are consistent and element-wise asymptotically normal, and we derive error bounds for the estimators. To evaluate the efficacy of the MMM mediation framework, we first investigate its finite-sample performance, including convergence properties, the behavior of the asymptotic approximations, and robustness to noise, via simulation studies. We then apply MMM mediation analysis to data from the Alzheimer's Disease Neuroimaging Initiative to study how cortical thickness of 202 brain regions may mediate the effects of 688 genome-wide significant single nucleotide polymorphisms (SNPs) (selected from approximately 1.5 million SNPs) on eleven cognitive-behavioral and diagnostic outcomes. The MMM mediation framework identifies biologically interpretable, many-to-many-to-many genetic-neural-cognitive pathways and improves downstream out-of-sample classification and prediction performance. Taken together, our results demonstrate the potential of MMM mediation analysis and highlight the value of statistical methodology for investigating complex, high-dimensional multi-layer pathways in science. The MMM package is available at https://github.com/THELabTop/MMM-Mediation.

High-dimensional Many-to-many-to-many Mediation Analysis

Abstract

We study high-dimensional mediation analysis in which exposures, mediators, and outcomes are all multivariate, and both exposures and mediators may be high-dimensional. We formalize this as a many (exposures)-to-many (mediators)-to-many (outcomes) (MMM) mediation analysis problem. Methodologically, MMM mediation analysis simultaneously performs variable selection for high-dimensional exposures and mediators, estimates the indirect effect matrix (i.e., the coefficient matrices linking exposure-to-mediator and mediator-to-outcome pathways), and enables prediction of multivariate outcomes. Theoretically, we show that the estimated indirect effect matrices are consistent and element-wise asymptotically normal, and we derive error bounds for the estimators. To evaluate the efficacy of the MMM mediation framework, we first investigate its finite-sample performance, including convergence properties, the behavior of the asymptotic approximations, and robustness to noise, via simulation studies. We then apply MMM mediation analysis to data from the Alzheimer's Disease Neuroimaging Initiative to study how cortical thickness of 202 brain regions may mediate the effects of 688 genome-wide significant single nucleotide polymorphisms (SNPs) (selected from approximately 1.5 million SNPs) on eleven cognitive-behavioral and diagnostic outcomes. The MMM mediation framework identifies biologically interpretable, many-to-many-to-many genetic-neural-cognitive pathways and improves downstream out-of-sample classification and prediction performance. Taken together, our results demonstrate the potential of MMM mediation analysis and highlight the value of statistical methodology for investigating complex, high-dimensional multi-layer pathways in science. The MMM package is available at https://github.com/THELabTop/MMM-Mediation.

Paper Structure

This paper contains 42 sections, 12 theorems, 119 equations, 3 figures, 1 algorithm.

Key Result

Lemma 3.3

For any given $\lambda_{Y,1},\lambda_{Y,2} > 0$, and noise vector $\bm{\xi} \in \mathbb{R}^{T}$, property $\mathcal{R}(\mathbf{m}, \bm{\beta}, \bm{\xi}, \lambda_{Y,1}, \lambda_{Y,2})$ holds if and only if: and for all $1 \leq k \leq T$. $\blacktriangleleft$$\blacktriangleleft$

Figures (3)

  • Figure 1: An overview of the many--to--many--to--many (MMM) mediation analysis framework. (a) Schematic representation of different types of mediation analysis. An illustration of how classical univariate- and multivariate mediation analysis models extend to the many--to--many--to--many (MMM) setting, where multivariate exposures $\mathbf{x}$, mediators $\mathbf{m}$, and outcomes $\mathbf{y}$ interact through multiple indirect pathways. (b) The MMM model. A schematic representation of the multivariate linear structural equation model linking $\mathbf{x}$, $\mathbf{m}$, $\mathbf{y}$, and covariates $\mathbf{z}$ through coefficient matrices $(\bm{\alpha},\bm{\beta},\bm{\gamma},\bm{\zeta},\bm{\eta})$. (c) Analysis pipeline of the MMM method. A high-level workflow showing input data layers, estimation of coefficient matrices, and the derivation of the indirect-effect matrix $\bm{\alpha\beta}$. (d) Output interpretation. Estimated coefficient matrices and indirect-effect patterns from MMM mediation reveal structured many--to--many--to--many exposure--mediator--outcome pathways.
  • Figure 2: Simulation results for the MMM mediation framework.(a) Heatmaps comparing the ground-truth coefficient matrices $\bm{\alpha}_0$, $\bm{\beta}_0$, and the indirect-effect matrix $\bm{\alpha}_0\bm{\beta}_0$ with their corresponding estimates $\widehat{\bm{\alpha}}$, $\widehat{\bm{\beta}}$, and $\widehat{\bm{\alpha}}\widehat{\bm{\beta}}$ under a representative simulation setting. (b) Stability of $\bm{\alpha}\bm{\beta}$ across combinations of sample size and noise level, and Type I error rates under null mediation paths. (c) Estimation error of $(\bm{\alpha},\bm{\beta},\bm{\alpha}\bm{\beta})$ as a function of sample size and noise level, summarizing robustness to high-noise regimes. (d) Convergence patterns of Normalized Root Mean Square Error (NRMSE) and correlation with the ground truth as a function of sample size ($n$), with representative histograms showing concentration of the estimates as $n$ increases. (e) Empirical distributions and Q--Q plots of normalized estimators illustrating asymptotic normality for selected entries of $\bm{\alpha}$ and $\bm{\beta}$.
  • Figure 3: Application of many-to-many-to-many mediation analysis to Alzheimer's disease.(a) A schematic overview of the analysis structure, showing 688 genome-wide significant SNPs ($\mathbf{x}$), 202 cortical-thickness mediators ($\mathbf{m}$), and 11 cognitive-behavior and diagnostic outcomes ($\mathbf{y}$), together with covariates. (b) Heatmap of the estimated $\bm{\alpha}$ effects, highlighting structured genetic--brain map. (c) Cortical surface visualization of the top mediating brain regions. (d) Mediation network connecting the strongest SNPs, cortical mediators, and outcomes, with edge thickness proportional to mediation strength. (e) Predictive performance comparison between baseline models and high-dimensional MMM mediation models, including scatter plots of observed versus predicted outcomes.

Theorems & Definitions (22)

  • Definition 3.1: Sign recovery property
  • Definition 3.2: Sign consistency
  • Lemma 3.3: Component-wise KKT conditions for $\bm{\beta}_k$
  • Lemma 3.4: Component-wise KKT conditions for $\bm{\alpha}_l$
  • Definition 3.5: Elastic Irrepresentable Condition (EIC)
  • Theorem 3.6: Sign consistency of $\bm{\beta}$
  • Theorem 3.7: Sign consistency of $\bm{\alpha}$
  • Proposition 3.8
  • Proposition 3.9: Error bound for $\bm{\beta}$
  • Theorem 3.10: Asymptotic normality of $\bm{\alpha}$ and $\bm{\beta}$
  • ...and 12 more