Table of Contents
Fetching ...

Learning the Optimal Composite Mediator: Closed-Form Solution and Inference

Zihuai He

Abstract

Understanding how an exposure transmits its effect through high-dimensional intermediaries is a central problem in observational research. We study the problem of finding a composite mediator that maximises the indirect effect of an exposure on an outcome in a linear structural equation model. Although the objective is non-convex in the weight vector, a geometric argument yields a closed-form global solution: the optimal weight bisects the angle between two computable path vectors in a weighted inner product space, recovered via two linear solves. The resulting algorithm, MaxIE, runs at the same cost as ordinary least squares -- orders of magnitude lower than numerical optimisation -- with a dual formulation for settings where mediators outnumber observations. The same path vectors yield a test for the global null that no composite mediator exists, with t(p-1) in the classical and t(n-2) in the dual regime. Power is characterised analytically as a function of the population path angle; simulations confirm size control and the power characterisation. Applied to a UK Biobank proteomics dataset (n=38,383, p=2,916), the method rejects the global null (p-value = 6.4e-9) and identifies the optimal proteomic composite mediating age's effect on dementia.

Learning the Optimal Composite Mediator: Closed-Form Solution and Inference

Abstract

Understanding how an exposure transmits its effect through high-dimensional intermediaries is a central problem in observational research. We study the problem of finding a composite mediator that maximises the indirect effect of an exposure on an outcome in a linear structural equation model. Although the objective is non-convex in the weight vector, a geometric argument yields a closed-form global solution: the optimal weight bisects the angle between two computable path vectors in a weighted inner product space, recovered via two linear solves. The resulting algorithm, MaxIE, runs at the same cost as ordinary least squares -- orders of magnitude lower than numerical optimisation -- with a dual formulation for settings where mediators outnumber observations. The same path vectors yield a test for the global null that no composite mediator exists, with t(p-1) in the classical and t(n-2) in the dual regime. Power is characterised analytically as a function of the population path angle; simulations confirm size control and the power characterisation. Applied to a UK Biobank proteomics dataset (n=38,383, p=2,916), the method rejects the global null (p-value = 6.4e-9) and identifies the optimal proteomic composite mediating age's effect on dementia.
Paper Structure (25 sections, 5 theorems, 26 equations, 4 figures, 4 tables)

This paper contains 25 sections, 5 theorems, 26 equations, 4 figures, 4 tables.

Key Result

Proposition 1

Let $\varphi = \angle_{\mathbf{V}}(\mathbf{p},\mathbf{q})$ be the angle between the two path vectors. The alignment $\cos\angle_{\mathbf{V}}(\mathbf{w},\mathbf{p})\cdot\cos\angle_{\mathbf{V}}(\mathbf{w},\mathbf{q})$ is maximised by the direction $\mathbf{w}^*$ that bisects the angle between $\mathbf

Figures (4)

  • Figure 1: QQ plots of the cosine test statistic against $t$ theoretical quantiles under three null scenarios ($1{,}000$ replicates; DGP as in text). Null scenarios: both inactive ($\boldsymbol{\alpha}=\boldsymbol{\beta}=\mathbf{0}$, blue); $\beta$-path null (red); $\alpha$-path null (green). Left (primal, $n=1000$, $p=100$): $T$ vs $t(99)$, confirming Proposition \ref{['prop:costest']}. Right (dual, $n=100$, $p=1000$): $T$ vs $t(98)$, confirming Proposition \ref{['prop:dual']}(iii).
  • Figure 2: Empirical validation of Proposition \ref{['prop:power']} ($1{,}000$ simulations per point, $\alpha=0.05$; DGP as in text, $p=40$). Left: mean$(T)$ and $\pm1$ SD band for $\varphi_0\in\{55^{\circ},70^{\circ},84^{\circ}\}$ ($\delta\in\{3.09,1.61,0.46\}$); dotted horizontals mark theoretical $\delta$; band collapses as $n\to\infty$, confirming property (i). Centre: empirical power vs $\varphi_0$ for $n\in\{100,200,1000\}$; grey verticals mark the detection threshold, confirming property (ii). Right: empirical power vs $n$ for $\varphi_0=55^{\circ}$ (above threshold), $84^{\circ}$ (below), and $90^{\circ}$ (null), confirming property (iii).
  • Figure 3: Empirical validation of Proposition \ref{['prop:dual']}(iii) ($1{,}000$ replicates per point, $n=40$, $\alpha=0.05$; DGP as in text). Left: empirical density of $T$ at $\varphi_0=60^{\circ}$ for $p\in\{40,80,160,1000\}$; dashed curve is $t(38,\delta)$ with $\delta=\cot 60^{\circ}\cdot\sqrt{38}\approx 3.56$; density converges to the approximation as $p$ grows. Centre: empirical power vs $\varphi_0$ for $p\in\{40,80,160\}$, showing the U-shape; dashed curve is $\pi_\infty$\ref{['eq:pi_inf']}. Right: empirical power vs $p$ for $\varphi_0\in\{60^{\circ},70^{\circ},80^{\circ},90^{\circ}\}$; dotted horizontals mark $\pi_\infty\in\{0.93,0.59,0.19,0.05\}$, confirming saturation with no sharp detection threshold.
  • Figure 4: Validation of the MaxIE composite mediator on the UK Biobank test set ($n = 11{,}515$; 371 dementia cases). (a) MaxIE composite score $M$ versus chronological age, with loess smoothers for controls (blue) and dementia cases (red). The composite rises steeply with age and is systematically elevated in cases, consistent with accelerated biological aging. (b) Dementia prevalence by quintile of $M_\perp = \mathbf{Q}_A(\mathbf{X}\hat{\mathbf{w}}^*)$, the composite residualised on chronological age, with 95% Wilson confidence intervals. Prevalence rises monotonically from $1.4\%$ (Q1) to $7.6\%$ (Q5), a five-fold gradient independent of age. (c) Each method as a point in the space of age tracking ($r_{MA} = \operatorname{cor}(M, A)$, x-axis) and disease prediction (AUC for dementia, y-axis). Error bars are 95% bootstrap confidence intervals ($B = 1{,}000$). The upper-right quadrant (shaded) is the region of jointly high age tracking and disease prediction; MaxIE uniquely occupies this quadrant while each baseline excels on only one dimension.

Theorems & Definitions (5)

  • Proposition 1: Bisector optimum
  • Proposition 2: Consistency of $\hat{\mathbf{w}}^*_+$
  • Proposition 3: Null distribution of $\cos\varphi$
  • Proposition 4: Power of the cosine test
  • Proposition 5: Dual implementation of MaxIE