Table of Contents
Fetching ...

GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

Chenxin Li, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan

TL;DR

This work tackles omni-modal biomedical representation learning under modality gaps by introducing GTP-4o, a modality-prompted heterogeneous graph framework. It unifies four clinical modalities (genomics, pathology images, cell graphs, and text) into a heterogeneous graph, completes missing modalities with a graph prompting mechanism that hallucinates informative nodes, and fuses cross-modal information through knowledge-guided global meta-paths and local multi-relational attention. The approach is validated on TCGA glioma and kidney cancer benchmarks, achieving state-of-the-art performance on glioma grading and survival while demonstrating robust handling of incomplete data. By explicitly modeling modality-specific features and cross-modal relations within a graph and leveraging domain knowledge, GTP-4o advances practical omni-modal biomedical analysis with incomplete data.

Abstract

Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modalitymissing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o), which embeds the numerous disparate clinical modalities into a unified representation, completes the deficient embedding of missing modality and reformulates the cross-modal learning with a graph-based aggregation. Specially, we establish a heterogeneous graph embedding to explicitly capture the diverse semantic properties on both the modality-specific features (nodes) and the cross-modal relations (edges). Then, we design a modality-prompted completion that enables completing the inadequate graph representation of missing modality through a graph prompting mechanism, which generates hallucination graphic topologies to steer the missing embedding towards the intact representation. Through the completed graph, we meticulously develop a knowledge-guided hierarchical cross-modal aggregation consisting of a global meta-path neighbouring to uncover the potential heterogeneous neighbors along the pathways driven by domain knowledge, and a local multi-relation aggregation module for the comprehensive cross-modal interaction across various heterogeneous relations. We assess the efficacy of our methodology on rigorous benchmarking experiments against prior state-of-the-arts. In a nutshell, GTP-4o presents an initial foray into the intriguing realm of embedding, relating and perceiving the heterogeneous patterns from various clinical modalities holistically via a graph theory. Project page: https://gtp-4-o.github.io/.

GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

TL;DR

This work tackles omni-modal biomedical representation learning under modality gaps by introducing GTP-4o, a modality-prompted heterogeneous graph framework. It unifies four clinical modalities (genomics, pathology images, cell graphs, and text) into a heterogeneous graph, completes missing modalities with a graph prompting mechanism that hallucinates informative nodes, and fuses cross-modal information through knowledge-guided global meta-paths and local multi-relational attention. The approach is validated on TCGA glioma and kidney cancer benchmarks, achieving state-of-the-art performance on glioma grading and survival while demonstrating robust handling of incomplete data. By explicitly modeling modality-specific features and cross-modal relations within a graph and leveraging domain knowledge, GTP-4o advances practical omni-modal biomedical analysis with incomplete data.

Abstract

Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modalitymissing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o), which embeds the numerous disparate clinical modalities into a unified representation, completes the deficient embedding of missing modality and reformulates the cross-modal learning with a graph-based aggregation. Specially, we establish a heterogeneous graph embedding to explicitly capture the diverse semantic properties on both the modality-specific features (nodes) and the cross-modal relations (edges). Then, we design a modality-prompted completion that enables completing the inadequate graph representation of missing modality through a graph prompting mechanism, which generates hallucination graphic topologies to steer the missing embedding towards the intact representation. Through the completed graph, we meticulously develop a knowledge-guided hierarchical cross-modal aggregation consisting of a global meta-path neighbouring to uncover the potential heterogeneous neighbors along the pathways driven by domain knowledge, and a local multi-relation aggregation module for the comprehensive cross-modal interaction across various heterogeneous relations. We assess the efficacy of our methodology on rigorous benchmarking experiments against prior state-of-the-arts. In a nutshell, GTP-4o presents an initial foray into the intriguing realm of embedding, relating and perceiving the heterogeneous patterns from various clinical modalities holistically via a graph theory. Project page: https://gtp-4-o.github.io/.
Paper Structure (13 sections, 9 equations, 4 figures, 2 tables)

This paper contains 13 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Methodology Comparison. Unlike (a) prior methods, (b) our framework enables learning unified omni-modal representation from various clinical modalities with modality missing and explicit capture of the cross-modal relations through the established heterogeneous graph representation.
  • Figure 2: Pipeline Overview of GTP-4o. We instantiate the omni-modal biomedical features (Sec. \ref{['method1']}), and embed them onto (a) the heterogeneous graph space (Sec. \ref{['method2']}). Then, we introduce (b) the modality-prompted completion via graph prompting to complete the missing embedding (Sec. \ref{['method3']}). After that, we design (c) the knowledge-guided hierarchical aggregation from a global meta-neighbouring to uncover the heterogeneous neighbourhoods and a local multi-relation aggregation to interact features across various heterogeneous relations (Sec. \ref{['method4']}).
  • Figure 3: (a)Analysis of Modality Usage. We provide the results of GTP-4o by using either Genes, Images, Cell graphs, Texts, or their combinations, on benchmarks of survival prediction (C-Index) and glioma grading (AUC). (b)Analysis of Modality-prompted Completion. We compare the relation pattern (similarity) in the original graph and the graph that is first removed specific instances then completed.
  • Figure 4: Analysis of Modality Missing. We study the results of (a) glioma grading and (b) survival prediction with the various missing ratios of Images and Genes. We compare the full framework of Ours and the version without our completion (baseline).