Modality-Specific Enhancement and Complementary Fusion for Semi-Supervised Multi-Modal Brain Tumor Segmentation
Tien-Dat Chung, Ba-Thinh Lam, Thanh-Huy Nguyen, Thien Nguyen, Nguyen Lan Vi Vu, Hoang-Loc Cao, Phat Kim Huynh, Min Xu
TL;DR
This work tackles semi-supervised multi-modal brain tumor segmentation by addressing cross-modality misalignment with two novel components: the Modality-specific Enhancing Module (MEM), which strengthens modality-specific cues via channel-wise attention, and the Complementary Information Fusion (CIF), which adaptively fuses information across modalities. The model is trained with a hybrid objective that combines supervised segmentation losses on labeled data and a cross-modal consistency loss on unlabeled data, enabling effective information exchange between modalities. Experiments on BraTS 2019 (HGG subset) show consistent improvements over strong SSL and multimodal baselines across 1%, 5%, and 10% labeled data, with ablation studies confirming that MEM and CIF are complementary and robust under scarce supervision. The results demonstrate the practical potential of explicit modality enhancement and adaptive fusion for reliable multi-modal brain tumor segmentation in data-scarce clinical settings.
Abstract
Semi-supervised learning (SSL) has become a promising direction for medical image segmentation, enabling models to learn from limited labeled data alongside abundant unlabeled samples. However, existing SSL approaches for multi-modal medical imaging often struggle to exploit the complementary information between modalities due to semantic discrepancies and misalignment across MRI sequences. To address this, we propose a novel semi-supervised multi-modal framework that explicitly enhances modality-specific representations and facilitates adaptive cross-modal information fusion. Specifically, we introduce a Modality-specific Enhancing Module (MEM) to strengthen semantic cues unique to each modality via channel-wise attention, and a learnable Complementary Information Fusion (CIF) module to adaptively exchange complementary knowledge between modalities. The overall framework is optimized using a hybrid objective combining supervised segmentation loss and cross-modal consistency regularization on unlabeled data. Extensive experiments on the BraTS 2019 (HGG subset) demonstrate that our method consistently outperforms strong semi-supervised and multi-modal baselines under 1\%, 5\%, and 10\% labeled data settings, achieving significant improvements in both Dice and Sensitivity scores. Ablation studies further confirm the complementary effects of our proposed MEM and CIF in bridging cross-modality discrepancies and improving segmentation robustness under scarce supervision.
