Hidden yet quantifiable: A lower bound for confounding strength using randomized trials

Piersilvio De Bartolomeis; Javier Abad; Konstantin Donhauser; Fanny Yang

Hidden yet quantifiable: A lower bound for confounding strength using randomized trials

Piersilvio De Bartolomeis, Javier Abad, Konstantin Donhauser, Fanny Yang

TL;DR

This paper tackles unobserved confounding in observational causal analysis by proposing a principled use of randomized trials to detect and quantify confounding strength. It introduces a marginal sensitivity framework with a confounding strength parameter $\Gamma$, a transportability-based setup, and two asymptotically valid tests that leverage either CATE bounds or ATE bounds to decide if confounding is above a user-specified threshold. A practical lower bound $\hat{\Gamma}_{\texttt{LB}}$ is derived, enabling researchers to conclude whether confounding is substantial enough to threaten conclusions. The approach is validated on synthetic, semi-synthetic (real trial supplemented with observational variants), and real WHI/HRT data, demonstrating that the method can distinguish meaningful from negligible confounding and guide corrective actions. Overall, the framework provides a quantitative, testable measure of unobserved confounding that can inform post-marketing surveillance and causal inferences in practice.

Abstract

In the era of fast-paced precision medicine, observational studies play a major role in properly evaluating new treatments in clinical practice. Yet, unobserved confounding can significantly compromise causal conclusions drawn from non-randomized data. We propose a novel strategy that leverages randomized trials to quantify unobserved confounding. First, we design a statistical test to detect unobserved confounding with strength above a given threshold. Then, we use the test to estimate an asymptotically valid lower bound on the unobserved confounding strength. We evaluate the power and validity of our statistical test on several synthetic and semi-synthetic datasets. Further, we show how our lower bound can correctly identify the absence and presence of unobserved confounding in a real-world setting.

Hidden yet quantifiable: A lower bound for confounding strength using randomized trials

TL;DR

, a transportability-based setup, and two asymptotically valid tests that leverage either CATE bounds or ATE bounds to decide if confounding is above a user-specified threshold. A practical lower bound

is derived, enabling researchers to conclude whether confounding is substantial enough to threaten conclusions. The approach is validated on synthetic, semi-synthetic (real trial supplemented with observational variants), and real WHI/HRT data, demonstrating that the method can distinguish meaningful from negligible confounding and guide corrective actions. Overall, the framework provides a quantitative, testable measure of unobserved confounding that can inform post-marketing surveillance and causal inferences in practice.

Abstract

Paper Structure (43 sections, 4 theorems, 56 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 43 sections, 4 theorems, 56 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Related work
Setting and notation
Sensitivity analysis
Methodology
Statistical tests for $H_0 (\Gamma)$
Estimating the ATE
Estimating the sensitivity interval
Two statistical tests
Advantages of each test
A lower bound on unobserved confounding strength
Synthetic Experiments
Datasets
Synthetic distribution
Semi-synthetic datasets
...and 28 more sections

Key Result

Lemma 3.1

For any $\mathbb{P}_{\operatorname{full}}$ which satisfies transportability, i.e. $\mu(X,\mathbb{P}_{\operatorname{full}}) =\mu(X,\mathbb{P} ^{\operatorname{os}}_{\operatorname{full}}),$ and any $\mathbb{P}_X$ which satisfies support inclusion, i.e. ${\operatorname{supp}}(\mathbb{P}_X) \subseteq {\o

Figures (5)

Figure 1: An illustrative example of the drug regulatory process: our lower bound allows taking proactive measures to address the unobserved confounding problem.
Figure 2: Graphical model that captures the Neyman-Rubin potential outcome framework with unobserved confounder $U$. $\mathbb{P} _{\operatorname{inv}}$ is the causal mechanism that does not change between the randomized trial and the observational study, while $\mathbb{P}_{\operatorname{cnf}}$ changes across studies. For the randomized trial, we assume there is no arrow from the confounders ($X, U$) to the treatment indicator $T$ due to its internal validity. Observed variables are colored in shades of grey.
Figure 3: For all the plots: the significance level is $\alpha=0.05$, $\phi^\star$ denotes the oracle test which rejects for $\Gamma< \Gamma^{\star}$, $\hat{\Gamma}_{\texttt{LB}}^{\mathrm{rct}}$ and $\hat{\Gamma}_{\texttt{LB}}^{\widetilde{\mathrm{os}}}$ denote which test is used to compute $\hat{\Gamma}_{\texttt{LB}}$. First row with synthetic experiment choosing $\Gamma^{\star}=5$: Probability of rejection for different $\Gamma$ and average $\hat{\Gamma}_{\texttt{LB}}$ for the test for (a) small sample size: $n_{\operatorname{rct}}=2K,n_{\operatorname{os}}=2K$ and (b) large sample size: $n_{\operatorname{rct}}=20K,n_{\operatorname{os}}=20K$. $\hat{\Gamma}_{\texttt{LB}}$ for (c) increasing sample size of the observational study with $n_{\operatorname{rct}}=20K$ and (d) increasing correlation coefficient; $n_{\operatorname{rct}}=20K,n_{\operatorname{os}}=20K$. Second row with the semi-synthetic Hillstrom dataset choosing $\Gamma^{\star}=5$ and using "history" as unobserved confounder (except in (h)): Probability of rejection for different $\Gamma$ and average $\hat{\Gamma}_{\texttt{LB}}$ for (e) small sample size: $n_{\operatorname{rct}}=2300$, $n_{\operatorname{os}}=6150$ and (f) large sample size: $n_{\operatorname{rct}}=7680$, $n_{\operatorname{os}}=20500$. $\hat{\Gamma}_{\texttt{LB}}$ for (g) increasing $n_{\operatorname{os}}$ with $n_{\operatorname{rct}}=7680$ and (h) increasing correlation coefficient.
Figure 4: Probability of rejection for different choices of $\Gamma$ for the test for the VOTE dataset. For all the plots, the significance level is $\alpha=0.05$ and $\Gamma^{\star}=9$. (a)-(b) Weak confounder: "age". (a) small sample size: $n_{\operatorname{rct}}=3.2K,n_{\operatorname{os}}=11K$ and (b) large sample size: $n_{\operatorname{rct}}=10.6K,n_{\operatorname{os}}=36.8K$. (c)-(d) Strong confounder: outcome $Y$. (c) small sample size: $n_{\operatorname{rct}}=3.2K,n_{\operatorname{os}}=11K$ and (d) large sample size: $n_{\operatorname{rct}}=10.6K,n_{\operatorname{os}}=36.8K$.
Figure 5: Probability of rejection for different choices of $\Gamma$ for the test for the STAR Project. For all the plots, the significance level is $\alpha=0.05$ and $\Gamma^{\star}=5$. We use the original sample sizes $n_{\operatorname{rct}}=600,n_{\operatorname{os}}=1.8K$. (a) weak confounder: "free lunch" (b) strong confounder: outcome $Y$.

Theorems & Definitions (8)

Definition 2.1: Marginal sensitivity set
Definition 2.2: Sensitivity bounds
Lemma 3.1
proof
Lemma 3.2
Proposition 3.1: Validity of the test
Proposition 3.2
proof

Hidden yet quantifiable: A lower bound for confounding strength using randomized trials

TL;DR

Abstract

Hidden yet quantifiable: A lower bound for confounding strength using randomized trials

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (8)