Table of Contents
Fetching ...

MedMAP: Promoting Incomplete Multi-modal Brain Tumor Segmentation with Alignment

Tianyi Liu, Zhaorui Tan, Muyin Chen, Xi Yang, Haochuan Jiang, Kaizhu Huang

TL;DR

MedMAP introduces a latent-space alignment paradigm for missing-modality brain tumor segmentation by anchoring modality-specific features to a predefined distribution $P_{mix}$, with two anchor strategies $P^{k}_{mix}$ and $P^{*}_{mix}$. The approach is theoretically supported by a tighter ELBO when aligning each modality individually to $P_{mix}$, and empirically validated across BraTS2018/2020 backbones (KD, SLS, DA) to reduce modality gaps and improve Dice scores under missing modalities. Key findings show consistent improvements across targets (WT, TC, ET) and datasets, with adaptive anchoring ($P^{*}_{mix}$) outperforming fixed anchors and standard normal baselines. Overall, MedMAP provides a general, backbone-agnostic method to learn invariant cross-modality representations, enabling more reliable brain tumor segmentation when MRI modalities are incomplete.

Abstract

Brain tumor segmentation is often based on multiple magnetic resonance imaging (MRI). However, in clinical practice, certain modalities of MRI may be missing, which presents a more difficult scenario. To cope with this challenge, Knowledge Distillation, Domain Adaption, and Shared Latent Space have emerged as commonly promising strategies. However, recent efforts typically overlook the modality gaps and thus fail to learn important invariant feature representations across different modalities. Such drawback consequently leads to limited performance for missing modality models. To ameliorate these problems, pre-trained models are used in natural visual segmentation tasks to minimize the gaps. However, promising pre-trained models are often unavailable in medical image segmentation tasks. Along this line, in this paper, we propose a novel paradigm that aligns latent features of involved modalities to a well-defined distribution anchor as the substitution of the pre-trained model}. As a major contribution, we prove that our novel training paradigm ensures a tight evidence lower bound, thus theoretically certifying its effectiveness. Extensive experiments on different backbones validate that the proposed paradigm can enable invariant feature representations and produce models with narrowed modality gaps. Models with our alignment paradigm show their superior performance on both BraTS2018 and BraTS2020 datasets.

MedMAP: Promoting Incomplete Multi-modal Brain Tumor Segmentation with Alignment

TL;DR

MedMAP introduces a latent-space alignment paradigm for missing-modality brain tumor segmentation by anchoring modality-specific features to a predefined distribution , with two anchor strategies and . The approach is theoretically supported by a tighter ELBO when aligning each modality individually to , and empirically validated across BraTS2018/2020 backbones (KD, SLS, DA) to reduce modality gaps and improve Dice scores under missing modalities. Key findings show consistent improvements across targets (WT, TC, ET) and datasets, with adaptive anchoring () outperforming fixed anchors and standard normal baselines. Overall, MedMAP provides a general, backbone-agnostic method to learn invariant cross-modality representations, enabling more reliable brain tumor segmentation when MRI modalities are incomplete.

Abstract

Brain tumor segmentation is often based on multiple magnetic resonance imaging (MRI). However, in clinical practice, certain modalities of MRI may be missing, which presents a more difficult scenario. To cope with this challenge, Knowledge Distillation, Domain Adaption, and Shared Latent Space have emerged as commonly promising strategies. However, recent efforts typically overlook the modality gaps and thus fail to learn important invariant feature representations across different modalities. Such drawback consequently leads to limited performance for missing modality models. To ameliorate these problems, pre-trained models are used in natural visual segmentation tasks to minimize the gaps. However, promising pre-trained models are often unavailable in medical image segmentation tasks. Along this line, in this paper, we propose a novel paradigm that aligns latent features of involved modalities to a well-defined distribution anchor as the substitution of the pre-trained model}. As a major contribution, we prove that our novel training paradigm ensures a tight evidence lower bound, thus theoretically certifying its effectiveness. Extensive experiments on different backbones validate that the proposed paradigm can enable invariant feature representations and produce models with narrowed modality gaps. Models with our alignment paradigm show their superior performance on both BraTS2018 and BraTS2020 datasets.
Paper Structure (32 sections, 1 theorem, 17 equations, 6 figures, 4 tables)

This paper contains 32 sections, 1 theorem, 17 equations, 6 figures, 4 tables.

Key Result

Proposition 1

For training a multi-modal teacher model, it is assumed that the existence of one modality ${\mathbf{Z}}_{i}$ is independent of the other modality ${\mathbf{Z}}_{j}$ where $i \in \{1, ..., J\}, j \in \{1, ..., J\}, i \neq j$. In other words, if one modality is missing or corrupted, other modalities

Figures (6)

  • Figure 1: Images of the four modalities in the brain tumor dataset with the ground truth (GT) segmentation label. Different colors represent different organs: Orange: NCR/NET, Yellow: ED, and Blue: ET.
  • Figure 2: T-SNE and segmentation visualization of different strategies with (w.) and without (w.o.) alignment. Different colors in t-SNE represent different modalities. GT denotes the groundtruth label.
  • Figure 3: Baseline architectures: a) KD, b) DA, c) SLS. B is the baseline model structure which represents 3D U-Net according to the baselines. E, fusion, and D are encoder, fusion, and decoder modules that compose SLS baseline architecture.
  • Figure 4: Qualitative segmentation results of mmFormer with $P_{mix}$ and without $P_{mix}$ on BraTS2018 under all missing scenarios. Below each sub-figure is the Dice. The Dice texts from left to right are WT, TC, and ET. Different colors represent different organs: Blue: NCR/NET, Orange: ED, and Yellow: ET. Captions on the upper-left corners indicate present modalities. Bottom-right corner is the groundtruth labels.
  • Figure 5: Qualitative segmentation results of mmFormer with $P_{N}$, $P^{k}_{mix}$ and $P^{*}_{mix}$ on BraTS2018 where three modalities are missing.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof