BrainSymphony: A parameter-efficient multimodal foundation model for brain dynamics with limited data
Moein Khajehnejad, Forough Habibollahi, Devon Stoliker, Adeel Razi
TL;DR
BrainSymphony introduces a parameter-efficient multimodal foundation model that unifies fMRI dynamics with diffusion-derived structural connectivity through a modular architecture. By coupling a Spatio–Temporal fMRI encoder (Spatial and Temporal Transformers plus a 1D context path) with a Signed Graph Transformer for SC and an adaptive fusion gate, it delivers state-of-the-art performance with only 5.6M parameters, substantially reducing data and compute requirements. The model achieves high-fidelity reconstructions, recovers canonical networks in an unsupervised manner, and generalizes to an external psychedelic dataset, where attention maps provide mechanistic interpretations of drug- and state-dependent brain reorganization. The work highlights how architecture-informed multimodal models can surpass larger baselines while offering interpretability and potential clinical applicability, paving the way for accessible AI in neuroscience.
Abstract
Foundation models are transforming neuroscience but are often prohibitively large, data-hungry, and difficult to deploy. Here, we introduce BrainSymphony, a lightweight and parameter-efficient foundation model with plug-and-play integration of fMRI time series and diffusion-derived structural connectivity, allowing unimodal or multimodal training and deployment without architectural changes while requiring substantially less data compared to the state-of-the-art. The model processes fMRI time series through parallel spatial and temporal transformer streams, distilled into compact embeddings by a Perceiver module, while a novel signed graph transformer encodes anatomical connectivity from diffusion MRI. These complementary representations are then combined through an adaptive fusion mechanism. Despite its compact design, BrainSymphony consistently outperforms larger models on benchmarks spanning prediction, classification, and unsupervised network discovery. Highlighting the model's generalizability and interpretability, attention maps reveal drug-induced context-dependent reorganization of cortical hierarchies in an independent psilocybin neuroimaging dataset. BrainSymphony delivers accessible, interpretable, and clinically meaningful results and demonstrates that architecturally informed, multimodal models can surpass much larger counterparts and advance applications of AI in neuroscience.
