fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding
Yuxiang Wei, Yanteng Zhang, Xi Xiao, Chengxuan Qian, Tianyang Wang, Vince D. Calhoun
TL;DR
Problem: bridging fMRI with language to build universal brain representations. Approach: a three-stage framework (fMRI tokenizer, LLM alignment, multi-task instruction tuning) trained on UKB/ABCD resting-state data and a synthetic fMRI–text descriptor corpus grounding low-level brain organization in language. Contributions: (i) a large descriptive corpus translating imaging features into textual descriptors; (ii) a text-aligned fMRI tokenizer mapping fMRI to discrete tokens in a language space; (iii) LLM fine-tuning with temporal modeling and multi-task instruction tuning; (iv) strong zero-shot and few-shot generalization with LoRA-enabled parameter efficiency. Impact: enables scalable, language-grounded brain modeling across datasets and tasks.
Abstract
Recent advances in multimodal large language models (LLMs) have enabled unified reasoning across images, audio, and video, but extending such capability to brain imaging remains largely unexplored. Bridging this gap is essential to link neural activity with semantic cognition and to develop cross-modal brain representations. To this end, we present fMRI-LM, a foundational model that bridges functional MRI (fMRI) and language through a three-stage framework. In Stage 1, we learn a neural tokenizer that maps fMRI into discrete tokens embedded in a language-consistent space. In Stage 2, a pretrained LLM is adapted to jointly model fMRI tokens and text, treating brain activity as a sequence that can be temporally predicted and linguistically described. To overcome the lack of natural fMRI-text pairs, we construct a large descriptive corpus that translates diverse imaging-based features into structured textual descriptors, capturing the low-level organization of fMRI signals. In Stage 3, we perform multi-task, multi-paradigm instruction tuning to endow fMRI-LM with high-level semantic understanding, supporting diverse downstream applications. Across various benchmarks, fMRI-LM achieves strong zero-shot and few-shot performance, and adapts efficiently with parameter-efficient tuning (LoRA), establishing a scalable pathway toward a language-aligned, universal model for structural and semantic understanding of fMRI.
