Table of Contents
Fetching ...

Multi-modal Cross-domain Self-supervised Pre-training for fMRI and EEG Fusion

Xinxu Wei, Kanhao Zhao, Yong Jiao, Nancy B. Carlisle, Hua Xie, Gregory A. Fonzo, Yu Zhang

TL;DR

This work tackles integrating fMRI and EEG data across spatial, temporal, and spectral domains to improve brain disorder classification. It introduces MCSP, a self-supervised framework with cross-domain losses ($L^{M}_{CD-SSL}$) and cross-modal losses ($L^{D}_{CM-SSL}$) plus cross-model distillation, enabling universal pre-training across modalities, datasets, tasks, and sites. The authors assemble a large unified pre-training corpus from ADHD-200, ABIDE, EMBARC, and HBN, and demonstrate superior performance on ADHD, ASD, and MDD-related tasks, with interpretable biomarker insights. The results suggest significant practical impact for robust, transferable multimodal neuroimaging representations in psychiatric research and clinical translation, facilitated by domain-specific data augmentation and distillation mechanisms.

Abstract

Neuroimaging techniques including functional magnetic resonance imaging (fMRI) and electroencephalogram (EEG) have shown promise in detecting functional abnormalities in various brain disorders. However, existing studies often focus on a single domain or modality, neglecting the valuable complementary information offered by multiple domains from both fMRI and EEG, which is crucial for a comprehensive representation of disorder pathology. This limitation poses a challenge in effectively leveraging the synergistic information derived from these modalities. To address this, we propose a Multi-modal Cross-domain Self-supervised Pre-training Model (MCSP), a novel approach that leverages self-supervised learning to synergize multi-modal information across spatial, temporal, and spectral domains. Our model employs cross-domain self-supervised loss that bridges domain differences by implementing domain-specific data augmentation and contrastive loss, enhancing feature discrimination. Furthermore, MCSP introduces cross-modal self-supervised loss to capitalize on the complementary information of fMRI and EEG, facilitating knowledge distillation within domains and maximizing cross-modal feature convergence. We constructed a large-scale pre-training dataset and pretrained MCSP model by leveraging proposed self-supervised paradigms to fully harness multimodal neuroimaging data. Through comprehensive experiments, we have demonstrated the superior performance and generalizability of our model on multiple classification tasks. Our study contributes a significant advancement in the fusion of fMRI and EEG, marking a novel integration of cross-domain features, which enriches the existing landscape of neuroimaging research, particularly within the context of mental disorder studies.

Multi-modal Cross-domain Self-supervised Pre-training for fMRI and EEG Fusion

TL;DR

This work tackles integrating fMRI and EEG data across spatial, temporal, and spectral domains to improve brain disorder classification. It introduces MCSP, a self-supervised framework with cross-domain losses () and cross-modal losses () plus cross-model distillation, enabling universal pre-training across modalities, datasets, tasks, and sites. The authors assemble a large unified pre-training corpus from ADHD-200, ABIDE, EMBARC, and HBN, and demonstrate superior performance on ADHD, ASD, and MDD-related tasks, with interpretable biomarker insights. The results suggest significant practical impact for robust, transferable multimodal neuroimaging representations in psychiatric research and clinical translation, facilitated by domain-specific data augmentation and distillation mechanisms.

Abstract

Neuroimaging techniques including functional magnetic resonance imaging (fMRI) and electroencephalogram (EEG) have shown promise in detecting functional abnormalities in various brain disorders. However, existing studies often focus on a single domain or modality, neglecting the valuable complementary information offered by multiple domains from both fMRI and EEG, which is crucial for a comprehensive representation of disorder pathology. This limitation poses a challenge in effectively leveraging the synergistic information derived from these modalities. To address this, we propose a Multi-modal Cross-domain Self-supervised Pre-training Model (MCSP), a novel approach that leverages self-supervised learning to synergize multi-modal information across spatial, temporal, and spectral domains. Our model employs cross-domain self-supervised loss that bridges domain differences by implementing domain-specific data augmentation and contrastive loss, enhancing feature discrimination. Furthermore, MCSP introduces cross-modal self-supervised loss to capitalize on the complementary information of fMRI and EEG, facilitating knowledge distillation within domains and maximizing cross-modal feature convergence. We constructed a large-scale pre-training dataset and pretrained MCSP model by leveraging proposed self-supervised paradigms to fully harness multimodal neuroimaging data. Through comprehensive experiments, we have demonstrated the superior performance and generalizability of our model on multiple classification tasks. Our study contributes a significant advancement in the fusion of fMRI and EEG, marking a novel integration of cross-domain features, which enriches the existing landscape of neuroimaging research, particularly within the context of mental disorder studies.
Paper Structure (33 sections, 10 equations, 10 figures, 8 tables)

This paper contains 33 sections, 10 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: The network architecture of the proposed MCSP. The model consists of several components, including multimodal inputs, the construction of data from different domains, encoding and extraction of features from different domains, projection heads of different modalities data in different domains as well as cross-modal and cross-domain self-supervised constraints and fusion. And a task-agnostic MLP is adopted as the classifier.
  • Figure 2: The temporal & frequency projectors for aligning fMRI and EEG sequences in the embedding space. For EEG's time and frequency sequences, we concatenate all the segments of features outputted by the encoder and then input them into the projector. $N$ denotes the number of subjects. $'125'$ represents the 125 equally sized segments we can obtain for each subject's EEG time and frequency sequences. Based on the experimental results, setting the length to 200 strikes an optimal balance between performance and computational efficiency.
  • Figure 3: The proposed Cross-domain Self-supervised loss function, which consists of an Intra-domain Cross-view Consistency Loss $L^{M}_{ID}$ and a Cross-domain Consistency Loss $L^{M}_{CD}$.
  • Figure 4: The proposed cross-modal self-supervised loss function, which consists of an Intra-domain Cross-modal Distillation Loss $L^{D}_{IM}$ and a Cross-modal Consistency Loss $L^{D}_{CM}$.
  • Figure 5: The diagram illustrates the concept of cross-model distillation pre-training across domains.
  • ...and 5 more figures