Integrate Any Omics: Towards genome-wide data integration for patient stratification
Shihao Ma, Andy G. X. Zeng, Benjamin Haibe-Kains, Anna Goldenberg, John E Dick, Bo Wang
TL;DR
IntegrAO tackles the critical problem of integrating incomplete multi-omics data for cancer patient stratification by constructing and fusing partially overlapping patient graphs across modalities, then learning unified embeddings with omics-specific Graph Neural Networks. The framework supports both transductive integration and inductive prediction, enabling robust classification of new patients with partial omics data. Across simulated data, AML case studies, and pan-cancer benchmarks, IntegrAO demonstrates superior robustness to missing data, refined subtyping with biological and clinical relevance, and reliable subtype prediction for unseen patients. This modality-agnostic approach has significant implications for precision oncology, offering a scalable, data-efficient path to holistic patient characterization and decision support.
Abstract
High-throughput omics profiling advancements have greatly enhanced cancer patient stratification. However, incomplete data in multi-omics integration presents a significant challenge, as traditional methods like sample exclusion or imputation often compromise biological diversity and dependencies. Furthermore, the critical task of accurately classifying new patients with partial omics data into existing subtypes is commonly overlooked. To address these issues, we introduce IntegrAO (Integrate Any Omics), an unsupervised framework for integrating incomplete multi-omics data and classifying new samples. IntegrAO first combines partially overlapping patient graphs from diverse omics sources and utilizes graph neural networks to produce unified patient embeddings. Our systematic evaluation across five cancer cohorts involving six omics modalities demonstrates IntegrAO's robustness to missing data and its accuracy in classifying new samples with partial profiles. An acute myeloid leukemia case study further validates its capability to uncover biological and clinical heterogeneity in incomplete datasets. IntegrAO's ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology, offering a holistic approach to patient characterization.
