Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis

Minjoo Lim; Bogyeong Kang; Tae-Eui Kam

Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis

Minjoo Lim, Bogyeong Kang, Tae-Eui Kam

TL;DR

This work tackles the problem of missing MRI modalities in brain tumor imaging by introducing PLAVE, a missing-modality generator that leverages a multi-modal translation network with multi-modal contrastive learning and entropy-based feature selection (QS-Attn), complemented by a segmentation decoder and self-representation losses. The objective combines an adversarial loss with a structured mix of ${L}_{con}$, ${L}_{seg}$, ${L}_{SR\_decoder}$, ${L}_{SMR}$, and ${L}_{MMR}$ into ${L}_{G}$ to enforce target-specific information in the synthesized images. On BraSyn-2024, the model achieves high image quality with average SSIM around ${0.918}$ and tumor-region fidelity, including per-modality SSIMs up to ${0.9327}$, demonstrating effective missing modality synthesis and potential improvements for downstream segmentation in clinical MRI workflows. The approach advances multi-source MRI translation by incorporating entropy-guided feature selection and segmentation-driven supervision to better preserve tumor structures across synthesized modalities.

Abstract

Multi-modal magnetic resonance imaging (MRI) is essential for providing complementary information about brain anatomy and pathology, leading to more accurate diagnoses. However, obtaining high-quality multi-modal MRI in a clinical setting is difficult due to factors such as time constraints, high costs, and patient movement artifacts. To overcome this difficulty, there is increasing interest in developing generative models that can synthesize missing target modality images from the available source ones. Therefore, our team, PLAVE, design a generative model for missing MRI that integrates multi-modal contrastive learning with a focus on critical tumor regions. Specifically, we integrate multi-modal contrastive learning, tailored for multiple source modalities, and enhance its effectiveness by selecting features based on entropy during the contrastive learning process. Additionally, our network not only generates the missing target modality images but also predicts segmentation outputs, simultaneously. This approach improves the generator's capability to precisely generate tumor regions, ultimately improving performance in downstream segmentation tasks. By leveraging a combination of contrastive, segmentation, and additional self-representation losses, our model effectively reflects target-specific information and generate high-quality target images. Consequently, our results in the Brain MR Image Synthesis challenge demonstrate that the proposed model excelled in generating the missing modality.

Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis

TL;DR

Abstract

Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)