Table of Contents
Fetching ...

A Bi-Pyramid Multimodal Fusion Method for the Diagnosis of Bipolar Disorders

Guoxin Wang, Sheng Shi, Shan An, Fengmei Fan, Wenshu Ge, Qi Wang, Feng Yu, Zhiren Wang

TL;DR

This study tackles the challenge of objectively diagnosing bipolar disorder by leveraging multimodal MRI data. It introduces the bi-Pyramid Multimodal Fusion (BPM-Fusion) framework, combining a Patch Pyramid Feature Extraction Module (P2FEM) for sMRI and a Spatio-temporal Feature Aggregation Module (SFAM) for rs-fMRI, with a fusion classifier to output BD probabilities. Across the collected BD dataset and the OpenfMRI public dataset, BPM-Fusion achieves state-of-the-art balanced accuracy, notably improving performance when using both modalities versus single modalities, demonstrating the practical value of efficient multimodal integration for clinical neurodiagnostics. The work highlights the feasibility and effectiveness of end-to-end multimodal fusion in BD diagnosis and sets the stage for exploring alternative fusion strategies to further enhance diagnostic accuracy.

Abstract

Previous research on the diagnosis of Bipolar disorder has mainly focused on resting-state functional magnetic resonance imaging. However, their accuracy can not meet the requirements of clinical diagnosis. Efficient multimodal fusion strategies have great potential for applications in multimodal data and can further improve the performance of medical diagnosis models. In this work, we utilize both sMRI and fMRI data and propose a novel multimodal diagnosis model for bipolar disorder. The proposed Patch Pyramid Feature Extraction Module extracts sMRI features, and the spatio-temporal pyramid structure extracts the fMRI features. Finally, they are fused by a fusion module to output diagnosis results with a classifier. Extensive experiments show that our proposed method outperforms others in balanced accuracy from 0.657 to 0.732 on the OpenfMRI dataset, and achieves the state of the art.

A Bi-Pyramid Multimodal Fusion Method for the Diagnosis of Bipolar Disorders

TL;DR

This study tackles the challenge of objectively diagnosing bipolar disorder by leveraging multimodal MRI data. It introduces the bi-Pyramid Multimodal Fusion (BPM-Fusion) framework, combining a Patch Pyramid Feature Extraction Module (P2FEM) for sMRI and a Spatio-temporal Feature Aggregation Module (SFAM) for rs-fMRI, with a fusion classifier to output BD probabilities. Across the collected BD dataset and the OpenfMRI public dataset, BPM-Fusion achieves state-of-the-art balanced accuracy, notably improving performance when using both modalities versus single modalities, demonstrating the practical value of efficient multimodal integration for clinical neurodiagnostics. The work highlights the feasibility and effectiveness of end-to-end multimodal fusion in BD diagnosis and sets the stage for exploring alternative fusion strategies to further enhance diagnostic accuracy.

Abstract

Previous research on the diagnosis of Bipolar disorder has mainly focused on resting-state functional magnetic resonance imaging. However, their accuracy can not meet the requirements of clinical diagnosis. Efficient multimodal fusion strategies have great potential for applications in multimodal data and can further improve the performance of medical diagnosis models. In this work, we utilize both sMRI and fMRI data and propose a novel multimodal diagnosis model for bipolar disorder. The proposed Patch Pyramid Feature Extraction Module extracts sMRI features, and the spatio-temporal pyramid structure extracts the fMRI features. Finally, they are fused by a fusion module to output diagnosis results with a classifier. Extensive experiments show that our proposed method outperforms others in balanced accuracy from 0.657 to 0.732 on the OpenfMRI dataset, and achieves the state of the art.
Paper Structure (12 sections, 4 equations, 2 figures, 4 tables)

This paper contains 12 sections, 4 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: An overview of the proposed method. Rs-fMRI and T1w are respectively fed into separate encoder branches for feature extraction. The Patch Pyramid Feature Extraction Module consists of four consecutive convolutional layers, while the Spatio-temporal Feature Aggregation Module comprises concatenated spatial feature extraction modules and temporal feature extraction modules. After dimensional reduction, the extracted features are concatenated and inputted into a classifier for prediction and output of the prediction results.
  • Figure 2: The structure of P2FEM and SFAM. (a) The structure of P2FEM. (b) The structure of SFAM.