Table of Contents
Fetching ...

Modality-Specific Enhancement and Complementary Fusion for Semi-Supervised Multi-Modal Brain Tumor Segmentation

Tien-Dat Chung, Ba-Thinh Lam, Thanh-Huy Nguyen, Thien Nguyen, Nguyen Lan Vi Vu, Hoang-Loc Cao, Phat Kim Huynh, Min Xu

TL;DR

This work tackles semi-supervised multi-modal brain tumor segmentation by addressing cross-modality misalignment with two novel components: the Modality-specific Enhancing Module (MEM), which strengthens modality-specific cues via channel-wise attention, and the Complementary Information Fusion (CIF), which adaptively fuses information across modalities. The model is trained with a hybrid objective that combines supervised segmentation losses on labeled data and a cross-modal consistency loss on unlabeled data, enabling effective information exchange between modalities. Experiments on BraTS 2019 (HGG subset) show consistent improvements over strong SSL and multimodal baselines across 1%, 5%, and 10% labeled data, with ablation studies confirming that MEM and CIF are complementary and robust under scarce supervision. The results demonstrate the practical potential of explicit modality enhancement and adaptive fusion for reliable multi-modal brain tumor segmentation in data-scarce clinical settings.

Abstract

Semi-supervised learning (SSL) has become a promising direction for medical image segmentation, enabling models to learn from limited labeled data alongside abundant unlabeled samples. However, existing SSL approaches for multi-modal medical imaging often struggle to exploit the complementary information between modalities due to semantic discrepancies and misalignment across MRI sequences. To address this, we propose a novel semi-supervised multi-modal framework that explicitly enhances modality-specific representations and facilitates adaptive cross-modal information fusion. Specifically, we introduce a Modality-specific Enhancing Module (MEM) to strengthen semantic cues unique to each modality via channel-wise attention, and a learnable Complementary Information Fusion (CIF) module to adaptively exchange complementary knowledge between modalities. The overall framework is optimized using a hybrid objective combining supervised segmentation loss and cross-modal consistency regularization on unlabeled data. Extensive experiments on the BraTS 2019 (HGG subset) demonstrate that our method consistently outperforms strong semi-supervised and multi-modal baselines under 1\%, 5\%, and 10\% labeled data settings, achieving significant improvements in both Dice and Sensitivity scores. Ablation studies further confirm the complementary effects of our proposed MEM and CIF in bridging cross-modality discrepancies and improving segmentation robustness under scarce supervision.

Modality-Specific Enhancement and Complementary Fusion for Semi-Supervised Multi-Modal Brain Tumor Segmentation

TL;DR

This work tackles semi-supervised multi-modal brain tumor segmentation by addressing cross-modality misalignment with two novel components: the Modality-specific Enhancing Module (MEM), which strengthens modality-specific cues via channel-wise attention, and the Complementary Information Fusion (CIF), which adaptively fuses information across modalities. The model is trained with a hybrid objective that combines supervised segmentation losses on labeled data and a cross-modal consistency loss on unlabeled data, enabling effective information exchange between modalities. Experiments on BraTS 2019 (HGG subset) show consistent improvements over strong SSL and multimodal baselines across 1%, 5%, and 10% labeled data, with ablation studies confirming that MEM and CIF are complementary and robust under scarce supervision. The results demonstrate the practical potential of explicit modality enhancement and adaptive fusion for reliable multi-modal brain tumor segmentation in data-scarce clinical settings.

Abstract

Semi-supervised learning (SSL) has become a promising direction for medical image segmentation, enabling models to learn from limited labeled data alongside abundant unlabeled samples. However, existing SSL approaches for multi-modal medical imaging often struggle to exploit the complementary information between modalities due to semantic discrepancies and misalignment across MRI sequences. To address this, we propose a novel semi-supervised multi-modal framework that explicitly enhances modality-specific representations and facilitates adaptive cross-modal information fusion. Specifically, we introduce a Modality-specific Enhancing Module (MEM) to strengthen semantic cues unique to each modality via channel-wise attention, and a learnable Complementary Information Fusion (CIF) module to adaptively exchange complementary knowledge between modalities. The overall framework is optimized using a hybrid objective combining supervised segmentation loss and cross-modal consistency regularization on unlabeled data. Extensive experiments on the BraTS 2019 (HGG subset) demonstrate that our method consistently outperforms strong semi-supervised and multi-modal baselines under 1\%, 5\%, and 10\% labeled data settings, achieving significant improvements in both Dice and Sensitivity scores. Ablation studies further confirm the complementary effects of our proposed MEM and CIF in bridging cross-modality discrepancies and improving segmentation robustness under scarce supervision.

Paper Structure

This paper contains 18 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Our semi-supervised cross-modality data settings is illustrated. Specifically, the raw MRI sequences obtained from the same patient which subsequently partly annotated by radiologists in order to produce a multi-modal semi-supervised dataset.
  • Figure 2: Illustration shows all components in our proposed framework, which consists of two branches according to two different input modalities. Each branch uses the same architecture and process, thereby a distinctive feature extracted by a U-Net encoder is processed by channel-wise attention of a Modality-specific Enhancing Module for enhancing modality-specific knowledge, facilitating the next feature fusion stage to produce a comprehensive feature representation at a Complementary Information Fusion layer. This representation finally concatenates with enhanced features of each branch, subsequently passing a U-Net decoder. The training procedure is jointly optimized by a semi-supervised strategy composed of supervised and consistency loss.
  • Figure 3: Visualization compares our method and CML, both models trained on T2-T1CE and T1-FLAIR modality combination with 10% of labeled data. The figure includes eight samples of four modalities, such that each row shows two samples of a specific modality. Note that blue square boxes indicate a higher recall prediction, red square boxes are for a higher precision prediction in comparison between our model and CML.
  • Figure 4: Visualization of the ablation study. The result stems from a model trained on T2-T1CE modality combination with 10% of labeled data.