Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading
Dunyuan Xu, Xi Wang, Jinyue Cai, Pheng-Ann Heng
TL;DR
This work addresses automated brain tumor grading from multi-modal MRI by mitigating modality-imbalanced information and fusion noise through a cross-modality guidance mechanism and dual attention. It employs a lightweight ResNet Mixed Convolution backbone and a two-stage training protocol where the primary modality guides secondary modalities during feature extraction, preserving essential information while leveraging complementary cues. Extensive experiments on BraTS2018 and BraTS2019 show superior performance over uni-modal baselines and several multi-modal fusion methods, achieving the highest AUC, accuracy, sensitivity, and specificity while avoiding ROI-based pre-processing. The approach is annotation-light and scalable, offering practical potential for clinical deployment and extension to other multi-modal diagnostic tasks.
Abstract
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly. Accurate identification of the type and grade of tumor in the early stages plays an important role in choosing a precise treatment plan. The Magnetic Resonance Imaging (MRI) protocols of different sequences provide clinicians with important contradictory information to identify tumor regions. However, manual assessment is time-consuming and error-prone due to big amount of data and the diversity of brain tumor types. Hence, there is an unmet need for MRI automated brain tumor diagnosis. We observe that the predictive capability of uni-modality models is limited and their performance varies widely across modalities, and the commonly used modality fusion methods would introduce potential noise, which results in significant performance degradation. To overcome these challenges, we propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading. To balance the tradeoff between model efficiency and efficacy, we employ ResNet Mix Convolution as the backbone network for feature extraction. Besides, dual attention is applied to capture the semantic interdependencies in spatial and slice dimensions respectively. To facilitate information interaction among modalities, we design a cross-modality guidance-aided module where the primary modality guides the other secondary modalities during the process of training, which can effectively leverage the complementary information of different MRI modalities and meanwhile alleviate the impact of the possible noise.
