Table of Contents
Fetching ...

Continually Evolved Multimodal Foundation Models for Cancer Prognosis

Jie Peng, Shuang Zhou, Longwei Yang, Yiran Song, Mohan Zhang, Kaixiong Zhou, Feng Xie, Mingquan Lin, Rui Zhang, Tianlong Chen

TL;DR

This work tackles cancer prognosis by enabling robust, adaptive fusion of heterogeneous data sources via a continually evolving multimodal foundation model. It introduces two core components: Pseudo Target Generation Module (PTGM) to mitigate catastrophic forgetting across tasks within a modality, and Instruction-based Knowledge Distillation (IKD) to preserve generative abilities when adding new modalities. The model leverages a Multimodal Q-Former with Modality-specific Low-Rank Adaptations (MM-LoRA) and Self-Gated Multimodal Query Fusion (SMQF) to efficiently fuse clinical text, images, and genomics, achieving state-of-the-art c-index performance on TCGA across several cancer types and demonstrating effective continual learning. These results suggest strong potential for real-world cancer prognosis systems that must adapt to new data sources and clinical settings while maintaining cross-modal coherence.

Abstract

Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates. To enhance prediction accuracy, previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information. However, existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals, thus rendering sub-optimal generalizability and limited utility in real-world applications. Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities. To address these, we propose a continually evolving multi-modal foundation model. Extensive experiments on the TCGA dataset demonstrate the effectiveness of our approach, highlighting its potential to advance cancer prognosis by enabling robust and adaptive multimodal integration.

Continually Evolved Multimodal Foundation Models for Cancer Prognosis

TL;DR

This work tackles cancer prognosis by enabling robust, adaptive fusion of heterogeneous data sources via a continually evolving multimodal foundation model. It introduces two core components: Pseudo Target Generation Module (PTGM) to mitigate catastrophic forgetting across tasks within a modality, and Instruction-based Knowledge Distillation (IKD) to preserve generative abilities when adding new modalities. The model leverages a Multimodal Q-Former with Modality-specific Low-Rank Adaptations (MM-LoRA) and Self-Gated Multimodal Query Fusion (SMQF) to efficiently fuse clinical text, images, and genomics, achieving state-of-the-art c-index performance on TCGA across several cancer types and demonstrating effective continual learning. These results suggest strong potential for real-world cancer prognosis systems that must adapt to new data sources and clinical settings while maintaining cross-modal coherence.

Abstract

Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates. To enhance prediction accuracy, previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information. However, existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals, thus rendering sub-optimal generalizability and limited utility in real-world applications. Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities. To address these, we propose a continually evolving multi-modal foundation model. Extensive experiments on the TCGA dataset demonstrate the effectiveness of our approach, highlighting its potential to advance cancer prognosis by enabling robust and adaptive multimodal integration.

Paper Structure

This paper contains 8 sections, 4 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Performance of Continual Learning.