Table of Contents
Fetching ...

Completed Feature Disentanglement Learning for Multimodal MRIs Analysis

Tianling Liu, Hongying Liu, Fanhua Shang, Lequan Yu, Tong Han, Liang Wan

TL;DR

This work tackles information loss in feature disentanglement for multimodal MRI analysis when handling more than two modalities. It introduces Completed Feature Disentanglement (CFD) to recover modality-shared, modality-specific, and crucially modality-partial-shared information, along with Dynamic Mixture-of-Experts Fusion (DMF) to learn local–global feature relationships via LinG_GN. The approach is validated on three MRI datasets (MRNet, MEN, BraTS 2021) and extended to a four-modal BraTS case, consistently outperforming nine state-of-the-art methods across multiple metrics and showing interpretability through gating weights and Grad-CAM visualization. The results suggest CFDL provides robust, interpretable, and scalable improvements for multimodal MRI classification, with potential applications to broader medical imaging tasks such as segmentation.

Abstract

Multimodal MRIs play a crucial role in clinical diagnosis and treatment. Feature disentanglement (FD)-based methods, aiming at learning superior feature representations for multimodal data analysis, have achieved significant success in multimodal learning (MML). Typically, existing FD-based methods separate multimodal data into modality-shared and modality-specific features, and employ concatenation or attention mechanisms to integrate these features. However, our preliminary experiments indicate that these methods could lead to a loss of shared information among subsets of modalities when the inputs contain more than two modalities, and such information is critical for prediction accuracy. Furthermore, these methods do not adequately interpret the relationships between the decoupled features at the fusion stage. To address these limitations, we propose a novel Complete Feature Disentanglement (CFD) strategy that recovers the lost information during feature decoupling. Specifically, the CFD strategy not only identifies modality-shared and modality-specific features, but also decouples shared features among subsets of multimodal inputs, termed as modality-partial-shared features. We further introduce a new Dynamic Mixture-of-Experts Fusion (DMF) module that dynamically integrates these decoupled features, by explicitly learning the local-global relationships among the features. The effectiveness of our approach is validated through classification tasks on three multimodal MRI datasets. Extensive experimental results demonstrate that our approach outperforms other state-of-the-art MML methods with obvious margins, showcasing its superior performance.

Completed Feature Disentanglement Learning for Multimodal MRIs Analysis

TL;DR

This work tackles information loss in feature disentanglement for multimodal MRI analysis when handling more than two modalities. It introduces Completed Feature Disentanglement (CFD) to recover modality-shared, modality-specific, and crucially modality-partial-shared information, along with Dynamic Mixture-of-Experts Fusion (DMF) to learn local–global feature relationships via LinG_GN. The approach is validated on three MRI datasets (MRNet, MEN, BraTS 2021) and extended to a four-modal BraTS case, consistently outperforming nine state-of-the-art methods across multiple metrics and showing interpretability through gating weights and Grad-CAM visualization. The results suggest CFDL provides robust, interpretable, and scalable improvements for multimodal MRI classification, with potential applications to broader medical imaging tasks such as segmentation.

Abstract

Multimodal MRIs play a crucial role in clinical diagnosis and treatment. Feature disentanglement (FD)-based methods, aiming at learning superior feature representations for multimodal data analysis, have achieved significant success in multimodal learning (MML). Typically, existing FD-based methods separate multimodal data into modality-shared and modality-specific features, and employ concatenation or attention mechanisms to integrate these features. However, our preliminary experiments indicate that these methods could lead to a loss of shared information among subsets of modalities when the inputs contain more than two modalities, and such information is critical for prediction accuracy. Furthermore, these methods do not adequately interpret the relationships between the decoupled features at the fusion stage. To address these limitations, we propose a novel Complete Feature Disentanglement (CFD) strategy that recovers the lost information during feature decoupling. Specifically, the CFD strategy not only identifies modality-shared and modality-specific features, but also decouples shared features among subsets of multimodal inputs, termed as modality-partial-shared features. We further introduce a new Dynamic Mixture-of-Experts Fusion (DMF) module that dynamically integrates these decoupled features, by explicitly learning the local-global relationships among the features. The effectiveness of our approach is validated through classification tasks on three multimodal MRI datasets. Extensive experimental results demonstrate that our approach outperforms other state-of-the-art MML methods with obvious margins, showcasing its superior performance.
Paper Structure (29 sections, 13 equations, 11 figures, 5 tables)

This paper contains 29 sections, 13 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: A concept map for illustrations of incomplete feature representations in existing FD methods in the three-modal case.
  • Figure 2: An example from the MEN dataset for illustrations of information loss between pair-wise modalities. Region of interest (ROI), including the tumor, edema and their surrounding area, is marked with an white box and used as the input of the model. Tumor and edema areas are marked with red and green boxes, respectively, and the yellow box highlights regions of high cell density in the tumor. Tumor characteristics are shared between T1C and FLAIR-C, edema characteristics between FLAIR-C and ADC, and cell density information between T1C and ADC.
  • Figure 3: The overview of proposed CFDL framework in the three-modal case. (a) For each modality, we adopt the same type of backbone for feature extraction. (b) Completed Feature Disentanglement (CFD) strategy decouples each extracted features to modality-shared features, modality-specific features, and modality-partial-shared features between pair-wise modalities. (c) Dynamic MoE Fusion (DMF) module dynamically and appropriately fuses decoupled features based on samples benefitted from LinG_GN. LinG_GN can obtain the complex interrelationships between these features. Specifically, ⓒ means concatenation operation, $FC$ represents a fully-connected layer, $ClS\ head$ denotes a classification head.
  • Figure 4: The display of cases from three multimodal MRI datasets.
  • Figure 5: Visualization results of the proposed framework on the MRNet and MEN datasets. Each sub-figure includes t-SNE visualization of the decoupled features (left) and a heatmap of the cosine similarities for each pair-wise decoupled features (right). In the heatmap, yellow indicates higher similarity, while blue indicates lower similarity.
  • ...and 6 more figures