Integrating Genomics into Multimodal EHR Foundation Models
Jonathan Amar, Edward Liu, Alessandra Breschi, Liangliang Zhang, Pouya Kheradpour, Sylvia Li, Lisa Soleymani Lehmann, Alessandro Giulianelli, Matt Edwards, Yugang Jia, David Nola, Raghav Mani, Pankaj Vats, Jesse Tetreault, T. J. Chen, Cory Y. McLean
TL;DR
The paper presents a multimodal EHR foundation model that integrates 3481 polygenic risk scores (PRS) with traditional EHR data, trained end-to-end on the All of Us cohort to learn rich, cross-modal health trajectories. By employing cross-attention or adapter-based embeddings, the model jointly reasons over static genomic features and dynamic clinical events, yielding improved discrimination for diseases such as Type 2 Diabetes ($AUROC$ improved by $+0.025$; $AUPRC$ by $+0.041$) and enabling novel risk-scoring via path-computing probabilities. Across analyses, PRS integration shows modest yet significant gains in disease prediction and demonstrates alignment with known PRS signals, with transfer-learning experiments indicating efficient adaptation to downstream tasks like stroke and COPD. The approach advances personalized, equitable real-world evidence by enabling richer health representations, dynamic risk assessment, and potential digital-twin-like simulations, while acknowledging biases, PRS limitations, and calibration needs for clinical deployment.
Abstract
This paper introduces an innovative Electronic Health Record (EHR) foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality, moving beyond traditional EHR-only approaches to build more holistic health profiles. Leveraging the extensive and diverse data from the All of Us (AoU) Research Program, this multimodal framework aims to learn complex relationships between clinical data and genetic predispositions. The methodology extends advancements in generative AI to the EHR foundation model space, enhancing predictive capabilities and interpretability. Evaluation on AoU data demonstrates the model's predictive value for the onset of various conditions, particularly Type 2 Diabetes (T2D), and illustrates the interplay between PRS and EHR data. The work also explores transfer learning for custom classification tasks, showcasing the architecture's versatility and efficiency. This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies, laying the groundwork for more personalized, equitable, and actionable real-world evidence generation in healthcare.
