Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation
Kaixiang Yang, Wenqi Shan, Xudong Li, Xuan Wang, Xikai Yang, Xi Wang, Pheng-Ann Heng, Qiang Li, Zhiwei Wang
TL;DR
This work tackles the challenge of incomplete multimodal brain tumor segmentation by decoupling modality representations into Self-feature and Mutual-features, reducing the learning burden of modality adaptation. It introduces DeMoSeg, which employs two $3\times 3\times 3$ convolutions per modality to map into four sub-spaces, a Channel-wised Sparse Self-Attention (CSSA) for sparse cross-guidance, and Radiologist-mimic Cross-modality Expression Relationships (RCR) to construct pseudo full-modality features when some modalities are missing. Training combines a segmentation loss with a knowledge-distillation term that aligns Mutual-features to Self-features, enabling robust performance across missing-modality scenarios; experiments on BraTS2020/2018/2015 show consistent Dice improvements over state-of-the-art methods, validating the approach's practicality. The results highlight the value of decoupling ego and other modalities and incorporating clinical priors to enhance robustness in incomplete multimodal brain tumor segmentation, with potential clinical impact for workflows with variable modality availability.
Abstract
Multi-modal brain tumor segmentation typically involves four magnetic resonance imaging (MRI) modalities, while incomplete modalities significantly degrade performance. Existing solutions employ explicit or implicit modality adaptation, aligning features across modalities or learning a fused feature robust to modality incompleteness. They share a common goal of encouraging each modality to express both itself and the others. However, the two expression abilities are entangled as a whole in a seamless feature space, resulting in prohibitive learning burdens. In this paper, we propose DeMoSeg to enhance the modality adaptation by Decoupling the task of representing the ego and other Modalities for robust incomplete multi-modal Segmentation. The decoupling is super lightweight by simply using two convolutions to map each modality onto four feature sub-spaces. The first sub-space expresses itself (Self-feature), while the remaining sub-spaces substitute for other modalities (Mutual-features). The Self- and Mutual-features interactively guide each other through a carefully-designed Channel-wised Sparse Self-Attention (CSSA). After that, a Radiologist-mimic Cross-modality expression Relationships (RCR) is introduced to have available modalities provide Self-feature and also `lend' their Mutual-features to compensate for the absent ones by exploiting the clinical prior knowledge. The benchmark results on BraTS2020, BraTS2018 and BraTS2015 verify the DeMoSeg's superiority thanks to the alleviated modality adaptation difficulty. Concretely, for BraTS2020, DeMoSeg increases Dice by at least 0.92%, 2.95% and 4.95% on whole tumor, tumor core and enhanced tumor regions, respectively, compared to other state-of-the-arts. Codes are at https://github.com/kk42yy/DeMoSeg
