UMBRAE: Unified Multimodal Brain Decoding
Weihao Xia, Raoul de Charette, Cengiz Öztireli, Jing-Hao Xue
TL;DR
UMBRAE tackles the challenge of decoding brain signals across subjects by introducing a universal brain encoder that aligns neural activity with pretrained image features, enabling multimodal decoding through frozen multimodal language models. A cross-subject training strategy maps diverse subjects into a common space, supporting data-efficient adaptation to new subjects. The paper also presents BrainHub, a comprehensive NSD-extended benchmark that pairs fMRI with semantic and spatial annotations to evaluate brain captioning, grounding, retrieval, and visual decoding. Across tasks, UMBRAE achieves superior or competitive performance with improved efficiency, demonstrating robust cross-subject generalization and practical subject adaptation, with code and BrainHub made publicly available.
Abstract
We address prevailing challenges of the brain-powered research, departing from the observation that the literature hardly recover accurate spatial information and require subject-specific models. To address these challenges, we propose UMBRAE, a unified multimodal decoding of brain signals. First, to extract instance-level conceptual and spatial details from neural signals, we introduce an efficient universal brain encoder for multimodal-brain alignment and recover object descriptions at multiple levels of granularity from subsequent multimodal large language model (MLLM). Second, we introduce a cross-subject training strategy mapping subject-specific features to a common feature space. This allows a model to be trained on multiple subjects without extra resources, even yielding superior results compared to subject-specific models. Further, we demonstrate this supports weakly-supervised adaptation to new subjects, with only a fraction of the total training data. Experiments demonstrate that UMBRAE not only achieves superior results in the newly introduced tasks but also outperforms methods in well established tasks. To assess our method, we construct and share with the community a comprehensive brain understanding benchmark BrainHub. Our code and benchmark are available at https://weihaox.github.io/UMBRAE.
