Table of Contents
Fetching ...

A Unified Framework for Nonlinear Mediation Analysis of Random Objects

Wenxi Tan, Bing Li, Lingzhou Xue

Abstract

Mediation analysis for complex, non-Euclidean data, such as probability distributions, compositions, images, and networks, presents significant methodological challenges due to the inherent nonlinearity and geometric constraints of such spaces. Existing approaches are often restricted to Euclidean settings or specific data types. We propose Random Object Mediation Analysis (ROMA), a unified framework that simultaneously accommodates object-valued exposures, mediators, and outcomes, enabling the analysis of nonlinear causal pathways in general metric spaces. ROMA leverages an additive Reproducing Kernel Hilbert Space (RKHS) operator model to rigorously disentangle direct and indirect causal pathways, which is a significant advancement over existing single-predictor or purely predictive additive frameworks. Theoretically, we establish the nonparametric identification of causal effects and derive global asymptotic normality for our estimators. Crucially, this theoretical foundation enables the construction of simultaneous confidence bands and global test statistics without the need for computationally intensive resampling. We demonstrate the practical utility of ROMA through simulations and real-world applications involving compositional mediators and distributional outcomes, extending the scope of mediation analysis.

A Unified Framework for Nonlinear Mediation Analysis of Random Objects

Abstract

Mediation analysis for complex, non-Euclidean data, such as probability distributions, compositions, images, and networks, presents significant methodological challenges due to the inherent nonlinearity and geometric constraints of such spaces. Existing approaches are often restricted to Euclidean settings or specific data types. We propose Random Object Mediation Analysis (ROMA), a unified framework that simultaneously accommodates object-valued exposures, mediators, and outcomes, enabling the analysis of nonlinear causal pathways in general metric spaces. ROMA leverages an additive Reproducing Kernel Hilbert Space (RKHS) operator model to rigorously disentangle direct and indirect causal pathways, which is a significant advancement over existing single-predictor or purely predictive additive frameworks. Theoretically, we establish the nonparametric identification of causal effects and derive global asymptotic normality for our estimators. Crucially, this theoretical foundation enables the construction of simultaneous confidence bands and global test statistics without the need for computationally intensive resampling. We demonstrate the practical utility of ROMA through simulations and real-world applications involving compositional mediators and distributional outcomes, extending the scope of mediation analysis.

Paper Structure

This paper contains 17 sections, 23 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Examples of non-Euclidean outcomes varying with exposures. Left: Compositional outcomes on the positive orthant of the unit sphere $\mathcal{S}^2 \subset \mathbb R^3$. Right: Fitted quantile functions representing age-at-death distributions. The color gradient (blue to yellow) indicates increasing exposure intensity (temperature).
  • Figure 2: Path diagram illustrating the causal structure of ROMA. Solid arrows denote structural causal relationships, while dashed arrows represent embeddings into Hilbert spaces. The terms $\tau_X$ and $\tau_M$ denote the Aronszajn feature maps defined by $x \mapsto \kappa_X(\cdot, x)$ and $m \mapsto \kappa_M(\cdot, m)$, respectively, while $\rho$ represents the isometric outcome embedding. The operators $(\Phi, \Psi, \boldsymbol \gamma)$ represent weak conditional mean operators.
  • Figure 3: The estimated quantile functions for $\mathbb{E}[Y(0,M(0))]$ (blue lines) and $\mathbb{E}[Y(1,M(1))]$ (red lines), together with the natural direct effect (NDE) and natural indirect effect (NIE), across 100 simulation runs for the setting I.1. The true values are highlighted in bold.
  • Figure 4: The estimated causal effects and pointwise confidence interval under setting I.1-I.4. The solid blue curve represents the estimated effect, the red dashed curve denotes the corresponding true effect, and the light blue band indicates the 95% confidence interval.
  • Figure 5: The empirical sizes and powers of our proposed model at size 0.05 at $l=n=100$ from 500 replications. The results are attained by varying the magnitude of the true value of $\|\operatorname{NDE}\|$ or $\|\operatorname{NIE}\|$. The red dotted line indicates the nominal type-I error rate of 0.05.
  • ...and 2 more figures