Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

Kaixiang Yang; Wenqi Shan; Xudong Li; Xuan Wang; Xikai Yang; Xi Wang; Pheng-Ann Heng; Qiang Li; Zhiwei Wang

Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

Kaixiang Yang, Wenqi Shan, Xudong Li, Xuan Wang, Xikai Yang, Xi Wang, Pheng-Ann Heng, Qiang Li, Zhiwei Wang

TL;DR

This work tackles the challenge of incomplete multimodal brain tumor segmentation by decoupling modality representations into Self-feature and Mutual-features, reducing the learning burden of modality adaptation. It introduces DeMoSeg, which employs two $3\times 3\times 3$ convolutions per modality to map into four sub-spaces, a Channel-wised Sparse Self-Attention (CSSA) for sparse cross-guidance, and Radiologist-mimic Cross-modality Expression Relationships (RCR) to construct pseudo full-modality features when some modalities are missing. Training combines a segmentation loss with a knowledge-distillation term that aligns Mutual-features to Self-features, enabling robust performance across missing-modality scenarios; experiments on BraTS2020/2018/2015 show consistent Dice improvements over state-of-the-art methods, validating the approach's practicality. The results highlight the value of decoupling ego and other modalities and incorporating clinical priors to enhance robustness in incomplete multimodal brain tumor segmentation, with potential clinical impact for workflows with variable modality availability.

Abstract

Multi-modal brain tumor segmentation typically involves four magnetic resonance imaging (MRI) modalities, while incomplete modalities significantly degrade performance. Existing solutions employ explicit or implicit modality adaptation, aligning features across modalities or learning a fused feature robust to modality incompleteness. They share a common goal of encouraging each modality to express both itself and the others. However, the two expression abilities are entangled as a whole in a seamless feature space, resulting in prohibitive learning burdens. In this paper, we propose DeMoSeg to enhance the modality adaptation by Decoupling the task of representing the ego and other Modalities for robust incomplete multi-modal Segmentation. The decoupling is super lightweight by simply using two convolutions to map each modality onto four feature sub-spaces. The first sub-space expresses itself (Self-feature), while the remaining sub-spaces substitute for other modalities (Mutual-features). The Self- and Mutual-features interactively guide each other through a carefully-designed Channel-wised Sparse Self-Attention (CSSA). After that, a Radiologist-mimic Cross-modality expression Relationships (RCR) is introduced to have available modalities provide Self-feature and also `lend' their Mutual-features to compensate for the absent ones by exploiting the clinical prior knowledge. The benchmark results on BraTS2020, BraTS2018 and BraTS2015 verify the DeMoSeg's superiority thanks to the alleviated modality adaptation difficulty. Concretely, for BraTS2020, DeMoSeg increases Dice by at least 0.92%, 2.95% and 4.95% on whole tumor, tumor core and enhanced tumor regions, respectively, compared to other state-of-the-arts. Codes are at https://github.com/kk42yy/DeMoSeg

Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

TL;DR

convolutions per modality to map into four sub-spaces, a Channel-wised Sparse Self-Attention (CSSA) for sparse cross-guidance, and Radiologist-mimic Cross-modality Expression Relationships (RCR) to construct pseudo full-modality features when some modalities are missing. Training combines a segmentation loss with a knowledge-distillation term that aligns Mutual-features to Self-features, enabling robust performance across missing-modality scenarios; experiments on BraTS2020/2018/2015 show consistent Dice improvements over state-of-the-art methods, validating the approach's practicality. The results highlight the value of decoupling ego and other modalities and incorporating clinical priors to enhance robustness in incomplete multimodal brain tumor segmentation, with potential clinical impact for workflows with variable modality availability.

Abstract

Paper Structure (12 sections, 7 equations, 4 figures, 7 tables)

This paper contains 12 sections, 7 equations, 4 figures, 7 tables.

Introduction
Methodology
Feature Decoupling of Self and Mutual Expression
Feature Partition
Channel-wised Sparse Self-Attention
Feature Compensation based on Clinical Knowledge
Tumor Segmentation and Overall Training
Experiments
Materials and Details
Comparison with the state-of-the-arts
Ablation Study.
Conclusion

Figures (4)

Figure 1: The overall framework of DeMoSeg, which consists of three main parts, (1) Feature Decoupling, (2) Feature Compensation and (3) Tumor Segmentation. $s_m$ represents the Self-feature of modality $m$, and $u_{m \rightarrow l}$ means Mutual-features of modality $m$ representing modality $l$. The alignment constraint $\mathcal{L}_{kd}$ and segmentation network loss $\mathcal{L}_{seg}$ are formulated as Eq. \ref{['klloss']} and Eq. \ref{['segloss']}, respectively. In Feature Compensation, the bidirectional arrows' color intensity corresponds to the priority level, with darker colors indicating higher priority. Modality indicator $\delta_m \in [0,1]$ represents the modality $m$ is missing or not, for instance, $\delta=[0,0,1,\times]$ means $t1$ and $tc$ are missing, $t2$ is available, $fl$ can be both.
Figure 2: Examples of radiologist-mimic cross-modality feature compensation strategy.
Figure 3: Visualization comparisons with SOTA methods on different missing modality scenarios. Green, Blue and Red represent WT, ET and TC, respectively. Left: four modality input images when missing scenarios. Right: predictions of RFNet, GSS and DeMoSeg, and the corresponding ground truth.
Figure 4: The segmentation results of DeMoSeg on different missing modality scenarios. Green, Blue and Red represent WT, ET and TC, respectively. Left: input images of four modalities. Right: the segmentation results when different missing modality scenario and corresponding ground truth. The $\delta = [\delta_{t1}, \delta_{tc}, \delta_{t2}, \delta_{fl}]$ indicates the present and absent input modalities.

Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

TL;DR

Abstract

Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)