Table of Contents
Fetching ...

DoAtlas-1: A Causal Compilation Paradigm for Clinical AI

Yulong Li, Jianxu Chen, Xiwei Liu, Chuanyue Suo, Rong Xia, Zhixiang Lu, Yichen Li, Xinlin Zhuang, Niranjana Arun Menon, Yutong Xie, Eran Segal, Imran Razzak

TL;DR

This work proposes causal compilation, a paradigm that transforms medical evidence from narrative text into executable code, and instantiates this paradigm in DoAtlas-1, compiling 1,445 effect kernels from 754 studies through effect standardization, conflict-aware graph construction, and real-world validation.

Abstract

Medical foundation models generate narrative explanations but cannot quantify intervention effects, detect evidence conflicts, or validate literature claims, limiting clinical auditability. We propose causal compilation, a paradigm that transforms medical evidence from narrative text into executable code. The paradigm standardizes heterogeneous research evidence into structured estimand objects, each explicitly specifying intervention contrast, effect scale, time horizon, and target population, supporting six executable causal queries: do-calculus, counterfactual reasoning, temporal trajectories, heterogeneous effects, mechanistic decomposition, and joint interventions. We instantiate this paradigm in DoAtlas-1, compiling 1,445 effect kernels from 754 studies through effect standardization, conflict-aware graph construction, and real-world validation (Human Phenotype Project, 10,000 participants). The system achieves 98.5% canonicalization accuracy and 80.5% query executability. This paradigm shifts medical AI from text generation to executable, auditable, and verifiable causal reasoning.

DoAtlas-1: A Causal Compilation Paradigm for Clinical AI

TL;DR

This work proposes causal compilation, a paradigm that transforms medical evidence from narrative text into executable code, and instantiates this paradigm in DoAtlas-1, compiling 1,445 effect kernels from 754 studies through effect standardization, conflict-aware graph construction, and real-world validation.

Abstract

Medical foundation models generate narrative explanations but cannot quantify intervention effects, detect evidence conflicts, or validate literature claims, limiting clinical auditability. We propose causal compilation, a paradigm that transforms medical evidence from narrative text into executable code. The paradigm standardizes heterogeneous research evidence into structured estimand objects, each explicitly specifying intervention contrast, effect scale, time horizon, and target population, supporting six executable causal queries: do-calculus, counterfactual reasoning, temporal trajectories, heterogeneous effects, mechanistic decomposition, and joint interventions. We instantiate this paradigm in DoAtlas-1, compiling 1,445 effect kernels from 754 studies through effect standardization, conflict-aware graph construction, and real-world validation (Human Phenotype Project, 10,000 participants). The system achieves 98.5% canonicalization accuracy and 80.5% query executability. This paradigm shifts medical AI from text generation to executable, auditable, and verifiable causal reasoning.
Paper Structure (153 sections, 15 theorems, 67 equations, 4 figures, 9 tables)

This paper contains 153 sections, 15 theorems, 67 equations, 4 figures, 9 tables.

Key Result

Theorem 4.4

Let the canonicalization operator $N$ be as in Definition def:canonicalization, mapping an input pair $(\varepsilon,c)$ to $(\varepsilon_{\mathrm{canon}},c_{\mathrm{canon}},\alpha)$, where $\varepsilon=(\Pi,\iota,o,\tau,\mu)$ and $c=(\theta,\mathrm{CI})$. Then $N$ satisfies:

Figures (4)

  • Figure 1: Query Gallery. DoAtlas supports six executable query types: Qdo (interventional effect), Qcf (counterfactual effect under an alternative exposure), Qtraj (time-indexed effect trajectory), QCATE (heterogeneous effects across populations/strata), Qmed (direct/indirect effects via a mediator), and Qjoint (joint intervention effect). Each answer is computed from a witness subset of evidence objects (EV-*); when a query is not executable, the system returns diagnostic flags instead of a numeric answer.
  • Figure 2: Illustration of the HPP study data.(a) Multi-modal and multi-omic data architecture of HPP. Molecular tests include genetics, microbiome metagenomics, metabolomics, proteomics, and single-cell RNA sequencing. Clinical tests capture deep physiological states via retinal imaging, liver and carotid ultrasound, DXA body composition, and continuous monitoring through CGM and multi-night sleep monitoring. Biobanking involves the systematic preservation of blood and stool samples for longitudinal discovery. (b) Timing of sleep monitoring with respect to all other phenotypes. The body system characteristics were measured within a period of ±6 months from the visit, with the sleep monitoring performed in three nights within a 2-week time period after that visit. ABI, ankle–brachial index; CGM, continuous glucose monitoring; DXA, dual-energy X-ray absorptiometry; ECG, electrocardiogram; HMO, health maintenance organization; IMT, intimamedia thickness; MS, mass spectrometry; PWV, pulse wave velocity; RDS, recent depressive symptoms.
  • Figure 3: Overview of DoAtlas. DoAtlas compiles heterogeneous clinical evidence into standardized interventional estimand objects with explicit contrasts, effect scales, time horizons, and target populations. It organizes comparable claims into conflict-aware causal graphs, supports executable causal queries, including do-calculus estimation, mechanistic pathway decomposition, counterfactual reasoning, individualized treatment effect estimation, combination intervention modeling, and dynamic prognostic trajectory modeling. External validation signals from HPP are used to assess reliability, adjudicate evidential conflicts, and support auditable clinical decision support.
  • Figure 4: Examples of evidence card files for each category.

Theorems & Definitions (38)

  • Definition 4.1: Estimand Intermediate Representation
  • Definition 4.2: Canonicalization
  • Definition 4.3: Comparability
  • Theorem 4.4: Canonicalization Guarantees
  • Theorem 4.5: Conflict Detection Completeness (w.r.t. $\mathcal{F}$)
  • Proposition 4.6: Evidence Selection Consistency
  • Theorem 2.1: Equivalence Relation
  • proof : Proof sketch
  • Definition 2.2: Poolability
  • Proposition 2.3: Determinism and Stability of Alignment
  • ...and 28 more