Table of Contents
Fetching ...

MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems

Tiago Mota, M. Rita Verdelho, Alceu Bissoto, Carlos Santiago, Catarina Barata

TL;DR

MMIST-ccRCC presents a real-world, public multi-modal dataset for clear cell renal cell carcinoma, combining radiology (CT/MRI), histopathology (WSIs), genomics, and clinical data for 618 patients. The authors develop a comprehensive benchmark pipeline featuring per-modality feature extractors, MIL-based imaging selection, a latent missing-modality reconstruction module, and both early and late fusion strategies for 12-month survival prediction. They demonstrate that early fusion with a mean-based aggregator delivers the strongest performance (surpassing the best single-modality ClinGen baseline) and that latent reconstruction of missing modalities further boosts accuracy, highlighting cross-modal complementarities. The work provides valuable baselines, demonstrates practical handling of missing data, and outlines directions to expand the dataset (e.g., proteomics, more genomics) and broaden the tasks.

Abstract

The acquisition of different data modalities can enhance our knowledge and understanding of various diseases, paving the way for a more personalized healthcare. Thus, medicine is progressively moving towards the generation of massive amounts of multi-modal data (\emph{e.g,} molecular, radiology, and histopathology). While this may seem like an ideal environment to capitalize data-centric machine learning approaches, most methods still focus on exploring a single or a pair of modalities due to a variety of reasons: i) lack of ready to use curated datasets; ii) difficulty in identifying the best multi-modal fusion strategy; and iii) missing modalities across patients. In this paper we introduce a real world multi-modal dataset called MMIST-CCRCC that comprises 2 radiology modalities (CT and MRI), histopathology, genomics, and clinical data from 618 patients with clear cell renal cell carcinoma (ccRCC). We provide single and multi-modal (early and late fusion) benchmarks in the task of 12-month survival prediction in the challenging scenario of one or more missing modalities for each patient, with missing rates that range from 26$\%$ for genomics data to more than 90$\%$ for MRI. We show that even with such severe missing rates the fusion of modalities leads to improvements in the survival forecasting. Additionally, incorporating a strategy to generate the latent representations of the missing modalities given the available ones further improves the performance, highlighting a potential complementarity across modalities. Our dataset and code are available here: https://multi-modal-ist.github.io/datasets/ccRCC

MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems

TL;DR

MMIST-ccRCC presents a real-world, public multi-modal dataset for clear cell renal cell carcinoma, combining radiology (CT/MRI), histopathology (WSIs), genomics, and clinical data for 618 patients. The authors develop a comprehensive benchmark pipeline featuring per-modality feature extractors, MIL-based imaging selection, a latent missing-modality reconstruction module, and both early and late fusion strategies for 12-month survival prediction. They demonstrate that early fusion with a mean-based aggregator delivers the strongest performance (surpassing the best single-modality ClinGen baseline) and that latent reconstruction of missing modalities further boosts accuracy, highlighting cross-modal complementarities. The work provides valuable baselines, demonstrates practical handling of missing data, and outlines directions to expand the dataset (e.g., proteomics, more genomics) and broaden the tasks.

Abstract

The acquisition of different data modalities can enhance our knowledge and understanding of various diseases, paving the way for a more personalized healthcare. Thus, medicine is progressively moving towards the generation of massive amounts of multi-modal data (\emph{e.g,} molecular, radiology, and histopathology). While this may seem like an ideal environment to capitalize data-centric machine learning approaches, most methods still focus on exploring a single or a pair of modalities due to a variety of reasons: i) lack of ready to use curated datasets; ii) difficulty in identifying the best multi-modal fusion strategy; and iii) missing modalities across patients. In this paper we introduce a real world multi-modal dataset called MMIST-CCRCC that comprises 2 radiology modalities (CT and MRI), histopathology, genomics, and clinical data from 618 patients with clear cell renal cell carcinoma (ccRCC). We provide single and multi-modal (early and late fusion) benchmarks in the task of 12-month survival prediction in the challenging scenario of one or more missing modalities for each patient, with missing rates that range from 26 for genomics data to more than 90 for MRI. We show that even with such severe missing rates the fusion of modalities leads to improvements in the survival forecasting. Additionally, incorporating a strategy to generate the latent representations of the missing modalities given the available ones further improves the performance, highlighting a potential complementarity across modalities. Our dataset and code are available here: https://multi-modal-ist.github.io/datasets/ccRCC
Paper Structure (19 sections, 1 equation, 3 figures, 2 tables)

This paper contains 19 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the proposed benchmarks. The vertical colored bars are the latent representation for each modality. In the example, MRI is the missing modality and is recovered in the reconstruction block.
  • Figure 2: Reconstruction block. Each modality is processed by a specific encoder, where the missing modality vector is replaced by zeros. The concatenated ($\bigoplus$) outputs are merged in a cross modal layer, whose output (yellow vector) is fed to modality-specific decoders for reconstruction. Modality color scheme matches Fig. \ref{['fig:overview']}.
  • Figure 3: UMAP projection for all modalities for different sets of latent representations: ground truth (blue), reconstructions (orange), and newly generated feature vectors for missing modalities (green).