Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading

Dunyuan Xu; Xi Wang; Jinyue Cai; Pheng-Ann Heng

Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading

Dunyuan Xu, Xi Wang, Jinyue Cai, Pheng-Ann Heng

TL;DR

This work addresses automated brain tumor grading from multi-modal MRI by mitigating modality-imbalanced information and fusion noise through a cross-modality guidance mechanism and dual attention. It employs a lightweight ResNet Mixed Convolution backbone and a two-stage training protocol where the primary modality guides secondary modalities during feature extraction, preserving essential information while leveraging complementary cues. Extensive experiments on BraTS2018 and BraTS2019 show superior performance over uni-modal baselines and several multi-modal fusion methods, achieving the highest AUC, accuracy, sensitivity, and specificity while avoiding ROI-based pre-processing. The approach is annotation-light and scalable, offering practical potential for clinical deployment and extension to other multi-modal diagnostic tasks.

Abstract

Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly. Accurate identification of the type and grade of tumor in the early stages plays an important role in choosing a precise treatment plan. The Magnetic Resonance Imaging (MRI) protocols of different sequences provide clinicians with important contradictory information to identify tumor regions. However, manual assessment is time-consuming and error-prone due to big amount of data and the diversity of brain tumor types. Hence, there is an unmet need for MRI automated brain tumor diagnosis. We observe that the predictive capability of uni-modality models is limited and their performance varies widely across modalities, and the commonly used modality fusion methods would introduce potential noise, which results in significant performance degradation. To overcome these challenges, we propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading. To balance the tradeoff between model efficiency and efficacy, we employ ResNet Mix Convolution as the backbone network for feature extraction. Besides, dual attention is applied to capture the semantic interdependencies in spatial and slice dimensions respectively. To facilitate information interaction among modalities, we design a cross-modality guidance-aided module where the primary modality guides the other secondary modalities during the process of training, which can effectively leverage the complementary information of different MRI modalities and meanwhile alleviate the impact of the possible noise.

Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading

TL;DR

Abstract

Paper Structure (21 sections, 6 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 21 sections, 6 equations, 5 figures, 8 tables, 1 algorithm.

Introduction
Related Work
MRI pre-processing
Single-modality learning
Multi-Modality Learning
Method
ResNet mixed convolution
Dual attention
Guidance-aided feature extraction module
Experiments and Results
Dataset
Evaluation metrics
Implementation details
Experiments on the uni-modality models
Ablation study
...and 6 more sections

Figures (5)

Figure 1: Examples of low-grade and high-grade cases in the T1ce, T1, T2, and Flair MRI images. These two cases are selected from BraTS2019. The whole tumor is enclosed in the red polygons.
Figure 2: The primary high-level feature is extracted by a Semantic Feature Extractor (SFE) followed by the dual attention module and is upsampled to guide the secondary feature extraction process. In secondary modality feature extraction, the low-level features extracted by a Low Feature Extractor (LFE) will be concatenated with the upsampled guidance features to achieve cross-modality guidance. The secondary high-level features are then extracted by a High Feature Extractor (HFE) and merged with the cross-modality feature in the previous step. We calculate the cross-entropy loss on each prediction for every high-level feature combination. A 1×1 convolution after each feature concatenation is applied to make the number of features identical, and compatible with the setting of the classifier.
Figure 3: The feature extraction part of the RMC model, which makes up the Semantic Feature Extractor (SFE), further divided into a Low-level Feature Extractor (LFE) and a High-level Feature Extractor (HFE).
Figure 4: Dual attention structure. Calculate the respective attention for the spatial (channel$\times$height$\times$width) and slice (slice$\times$slice) then combine their results together.
Figure 5: Feature maps from an MRI case in BraTS2019 that the RMC model predicts wrongly while our model predicts correctly. We present the original images together with different feature maps created by the SFE, LFE and HFE, these exists an obvious semantic gap between low-level features and high-level features.

Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading

TL;DR

Abstract

Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading

Authors

TL;DR

Abstract

Table of Contents

Figures (5)