Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning

Shimeng Huang; Matthew Robinson; Francesco Locatello

Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning

Shimeng Huang, Matthew Robinson, Francesco Locatello

TL;DR

A representation learning framework is proposed that exploits cross-environment invariance to recover latent exogenous components of genetic instruments under various mixing mechanisms and demonstrates the effectiveness of this approach through simulations and semi-synthetic experiments.

Abstract

Mendelian Randomization (MR) is a prominent observational epidemiological research method designed to address unobserved confounding when estimating causal effects. However, core assumptions -- particularly the independence between instruments and unobserved confounders -- are often violated due to population stratification or assortative mating. Leveraging the increasing availability of multi-environment data, we propose a representation learning framework that exploits cross-environment invariance to recover latent exogenous components of genetic instruments. We provide theoretical guarantees for identifying these latent instruments under various mixing mechanisms and demonstrate the effectiveness of our approach through simulations and semi-synthetic experiments using data from the All of Us Research Hub.

Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning

TL;DR

Abstract

Paper Structure (24 sections, 20 theorems, 82 equations, 9 figures)

This paper contains 24 sections, 20 theorems, 82 equations, 9 figures.

Introduction
Contributions and Overview
Other Related Works
Motivation and Problem Setup
Existing Issues in Mendelian Randomization
Problem Setup
Identifying Latent Components of Confounded Instruments
Identifying $W$ via Distributional Invariance
Identification of $V$
Identification in Practice
Good and Bad Use of Learned Representations
Experiments
Deconfounding Genetic Variants from All of Us
Theory Verification and Ablation Studies
With and without independence loss
...and 9 more sections

Key Result

Proposition 2.1

Given an arbitrary random variable $Z\in\mathcal{Z}$, if there does not exist a function $\ell$ such that $Z = \ell(A, B)$ where $A\mathop{\mathrm{\perp\!\!\!\perp}}\nolimits B$ and A satisfies all conditons in Defintion def:valid_iv, then there is no function $\varphi$ such that $\varphi(Z)$ satisf

Figures (9)

Figure 1: Left: DAG with disentangled latent variables that generate $Z$, where $U_2$ and $U_3$, if observed, are valid instruments for $D$ with respect to $Y$. Right: ADMG without considering the disentangled latent variables, $Z_1$ to $Z_3$ are all invalid instruments due to $Z_1$'s violation of exchangability.
Figure 2: Illustration of our general setup where $Z$ is a complex, entangled instrument containing some valid information (represented by $W$) as IV for $D$ with respect to $Y$, and some invalid information (represented by $V$). $V$ and $W$ are not directly observed and are not necessarily subvectors of $Z$.
Figure 3: Bias of estimated ACE based on different estimators based on semi-synthetic experiments using genetic variants from AoU biobank.
Figure 4: Bias comparison under different mixing functions (Polynomials of degree $1$-$3$ and invertible MLP). Our methods remain unbiased while all other methods exhibit large bias.
Figure 5: Estimation bias given misspecified latent dimensions ($\hat{p}$) when the true dimension is $p=2$. Our methods remain unbiased for under-specified and moderately over-specified dimensions of $\widehat{W}$.
...and 4 more figures

Theorems & Definitions (51)

Definition 2.1: Valid instruments
Example 2.1
Remark 2.1
Proposition 2.1
Proposition 2.2: Bijective transformations of a valid instrument are also valid
Corollary 2.1: Rank perserving bijective transformations of an identifying IV
Definition 3.1: Identification of latent components
Lemma 3.1: Functional characterization of identification
Remark 3.1
Theorem 3.1: Identification of $W$ under polynomial mixing
...and 41 more

Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning

TL;DR

Abstract

Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (51)