Table of Contents
Fetching ...

Towards a general-purpose foundation model for fMRI analysis

Cheng Wang, Yu Jiang, Zhihao Peng, Chenxin Li, Changbae Bang, Lin Zhao, Wanyi Fu, Jinglei Lv, Jorge Sepulcre, Carl Yang, Lifang He, Tianming Liu, Xue-Jun Kong, Quanzheng Li, Daniel S. Barron, Anqi Qiu, Randy Hirschtick, Byung-Hoon Kim, Hongbin Han, Xiang Li, Yixuan Yuan

Abstract

Functional MRI (fMRI) is crucial for studying brain function and diagnosing neurological disorders. However, existing analysis methods suffer from reproducibility and transferability challenges due to complex preprocessing pipelines and task-specific model designs. In this work, we introduce NeuroSTORM (Neuroimaging Foundation Model with Spatial-Temporal Optimized Representation Modeling) that learns generalizable representations directly from 4D fMRI volumes and enables efficient transfer to diverse downstream applications. Specifically, NeuroSTORM is pre-trained on 28.65 million fMRI frames from over 50,000 subjects, spanning multiple centers and ages 5 to 100. It combines an efficient spatiotemporal modeling design and lightweight task adaptation to enable scalable pre-training and fast transfer to downstream applications. Here we show that NeuroSTORM consistently outperforms existing methods across five downstream tasks, including demographic prediction, phenotype prediction, disease diagnosis, re-identification, and state classification. On two multi-hospital clinical cohorts with 17 diagnoses, NeuroSTORM achieves the best diagnosis performance while remaining predictive of psychological and cognitive phenotypes. These results suggest that NeuroSTORM could become a standardized foundation model for reproducible and transferable fMRI analysis.

Towards a general-purpose foundation model for fMRI analysis

Abstract

Functional MRI (fMRI) is crucial for studying brain function and diagnosing neurological disorders. However, existing analysis methods suffer from reproducibility and transferability challenges due to complex preprocessing pipelines and task-specific model designs. In this work, we introduce NeuroSTORM (Neuroimaging Foundation Model with Spatial-Temporal Optimized Representation Modeling) that learns generalizable representations directly from 4D fMRI volumes and enables efficient transfer to diverse downstream applications. Specifically, NeuroSTORM is pre-trained on 28.65 million fMRI frames from over 50,000 subjects, spanning multiple centers and ages 5 to 100. It combines an efficient spatiotemporal modeling design and lightweight task adaptation to enable scalable pre-training and fast transfer to downstream applications. Here we show that NeuroSTORM consistently outperforms existing methods across five downstream tasks, including demographic prediction, phenotype prediction, disease diagnosis, re-identification, and state classification. On two multi-hospital clinical cohorts with 17 diagnoses, NeuroSTORM achieves the best diagnosis performance while remaining predictive of psychological and cognitive phenotypes. These results suggest that NeuroSTORM could become a standardized foundation model for reproducible and transferable fMRI analysis.

Paper Structure

This paper contains 16 sections, 5 equations, 6 figures.

Figures (6)

  • Figure 1: Overview of our proposed NeuroSTORM framework. (a) Data corpus and preprocessing: The model is pre-trained on a collection of publicly available datasets, including over 50,000 rsfMRI and 16,000 tfMRI sequences. All data are aligned to 2mm MNI152 space to create standardized 4D volumes. Since we did not collect the complete metadata for the DMT-HAR-MED dataset, DMT-HAR-MED is not included in statistical information of the datasets. (b) NeuroSTORM pre-training: The model utilizes a masked autoencoder paradigm with a STRD module to enhance the learning of long-range spatiotemporal relationships. (c) Downstream tasks with fine-tuning: For evaluation, NeuroSTORM employs a SWM Backbone and TPT techique for efficient fine-tuning to various downstream tasks. The benchmark includes age and gender prediction, phenotype prediction, disease diagnosis, fMRI re-identification, and fMRI state classification. (d) Comprehensive performance evaluation: We systematically benchmark NeuroSTORM against previous ROI-based and Volume-based state-of-the-art models across a diverse set of downstream tasks. The radar chart illustrates NeuroSTORM's consistent performance improvements.
  • Figure 2: Evaluation of NeuroSTORM's performance in (reported) gender classification and age regression tasks. (a) Gender classification performance: NeuroSTORM consistently outperformed competing ROI-based and volume-based methods across datasets, including HCP-YA, HCP-A, HCP-D, UKB and ABCD. (b) Age regression error: In age regression tasks, NeuroSTORM achieved the lowest error rates, surpassing competing methods across multiple datasets. (c) Label efficiency results: NeuroSTORM demonstrated strong adaptability when trained with limited data. Remarkably, even with only 10%-50% of the training data, it maintained competitive performance in both age and gender prediction tasks. With increasing training data proportions, its performance steadily improved, achieving optimal results with full datasets. Variances were estimated from five technical replicates.Pairwise significance markers were computed using a two-sided paired t-test (ttest_rel) without multiple-comparison correction, and the corresponding P-value is annotated.
  • Figure 3: Performance evaluation of NeuroSTORM in phenotype prediction task. (a) HCP-YA dataset: NeuroSTORM demonstrates superior Pearson Correlation Coefficients (PCC) across diverse phenotype scores, including MMSE Score (P01), Social Task Performance (P02), Cognitive Total Score (Age Adjusted) (P03), Emotion Task Accuracy (P04), Language Task Accuracy (P05), and Strength Score (Age Adjusted) (P06). Even in data-scarce scenarios (10%-50%), NeuroSTORM maintains competitive PCC performance. (b) TCP dataset: NeuroSTORM is evaluated for its ability to predict crucial disease-related scores in subjects with psychiatric disorders, including Anxiety Sensitivity (P07), CGI Severity Score (P08), DASS Anxiety Score (P09), DASS Stress Score (P10), PANSS General Score (P11), PANSS Negative Symptoms (P12), PANSS Positive Symptoms (P13), NEO Agreeableness Score (P14), TCI Harm Avoidance Score (P15), and TCI Cautiousness Score (P16). Variances were estimated from five technical replicates. Pairwise significance markers were computed using a two-sided paired t-test (ttest_rel) without multiple-comparison correction, and the corresponding P-value is annotated.
  • Figure 4: Evaluation of NeuroSTORM's performance on disease diagnosis task. (a) Classification accuracy on ADHD200, COBRE, HCP-EP, MND and UCLA shows that NeuroSTORM consistently outperforms all ROI-based and volume-based baselines, highlighting its strong generalisability across neurological and psychiatric disorders. (b) Label efficiency analysis demonstrates that NeuroSTORM maintains robust performance even when trained with only a fraction of the fine-tuning data, underscoring its suitability for data-scarce scenarios. Variances were estimated from five technical replicates. Pairwise significance markers were computed using a two-sided paired t-test (ttest_rel) without multiple-comparison correction, and the corresponding P-value is annotated.
  • Figure 5: Evaluation of fMRI re-identification on HCP-YA with the gallery size fixed at 100. Under a closed set protocol, each 4D sequence is embedded once into an $\ell_{2}$-normalized feature vector and matched via exhaustive nearest neighbor search in the gallery, with retrieval quality reported by Rank-1 Accuracy and mAP.
  • ...and 1 more figures