NeuroBind: Towards Unified Multimodal Representations for Neural Signals

Fengyu Yang; Chao Feng; Daniel Wang; Tianye Wang; Ziyao Zeng; Zhiyang Xu; Hyoungseob Park; Pengliang Ji; Hanbin Zhao; Yuanning Li; Alex Wong

NeuroBind: Towards Unified Multimodal Representations for Neural Signals

Fengyu Yang, Chao Feng, Daniel Wang, Tianye Wang, Ziyao Zeng, Zhiyang Xu, Hyoungseob Park, Pengliang Ji, Hanbin Zhao, Yuanning Li, Alex Wong

TL;DR

NeuroBind addresses the fragmentation of neural-signal analysis by unifying EEG, fMRI, calcium imaging, and spiking data into a shared embedding aligned with pre-trained vision-language models. It achieves this alignment using a frozen image encoder and neural encoders trained with a symmetric InfoNCE objective, enabling zero-shot and cross-modal tasks without retraining the visual component. The method enables cross-modal retrieval, zero-shot classification, zero-shot image reconstruction, and integration with Neuro-LLM, demonstrated across four diverse datasets with notable performance gains over baselines. This unified representation has the potential to advance neuroscience research, facilitate neuroprosthetics, and enhance AI systems by leveraging high-resource modalities. Overall, NeuroBind provides a scalable, modality-agnostic framework for interpreting and leveraging complex brain signals through vision-language priors.

Abstract

Understanding neural activity and information representation is crucial for advancing knowledge of brain function and cognition. Neural activity, measured through techniques like electrophysiology and neuroimaging, reflects various aspects of information processing. Recent advances in deep neural networks offer new approaches to analyzing these signals using pre-trained models. However, challenges arise due to discrepancies between different neural signal modalities and the limited scale of high-quality neural data. To address these challenges, we present NeuroBind, a general representation that unifies multiple brain signal types, including EEG, fMRI, calcium imaging, and spiking data. To achieve this, we align neural signals in these image-paired neural datasets to pre-trained vision-language embeddings. Neurobind is the first model that studies different neural modalities interconnectedly and is able to leverage high-resource modality models for various neuroscience tasks. We also showed that by combining information from different neural signal modalities, NeuroBind enhances downstream performance, demonstrating the effectiveness of the complementary strengths of different neural modalities. As a result, we can leverage multiple types of neural signals mapped to the same space to improve downstream tasks, and demonstrate the complementary strengths of different neural modalities. This approach holds significant potential for advancing neuroscience research, improving AI systems, and developing neuroprosthetics and brain-computer interfaces.

NeuroBind: Towards Unified Multimodal Representations for Neural Signals

TL;DR

Abstract

Paper Structure (27 sections, 2 equations, 5 figures, 5 tables)

This paper contains 27 sections, 2 equations, 5 figures, 5 tables.

Introduction
Related Work
DNN-based visual neural encoding models.
Visual neural signal decoding.
Visual decoding with generative methods.
Representation learning for neural signals.
Method
Binding brain signals with vision and language
Applications
Cross-modal retrieval across neural modalities.
Zero-shot brain signal classification.
Zero-shot image reconstruction from neural signals.
Neuro-LLM.
Experiments
Implementation Details
...and 12 more sections

Figures (5)

Figure 1: Method Overview. (left) We align embeddings from neural signals with a pre-trained visual embedding trained on large-scale vision language datasets. (right) Our Neurobind embeddings can be applied in pre-trained LLM and text-based diffusion models for neuroscience tasks without re-training.
Figure 2: fMRI to image reconstruction. We present some generated images from pretrained frozen text-to-image stable diffusion models ramesh2022hierarchical conditioned on our fMRI embedding. To compare, We also present the generated images that correspond to the same fMRI signal from MindDiffuser Lu2023MindDiffuserCI.
Figure 3: EEG to image reconstruction. We show some sampled images from pretrained frozen text-to-image stable diffusion models ramesh2022hierarchical conditioned on our EEG embedding. For comparison, we also present the generated images corresponding to the same EEG signal from DreamDiffusion bai2023dreamdiffusion
Figure 4: Calcium Imaging (CI) to image reconstruction. We present representative images from pretrained frozen text-to-image stable diffusion ramesh2022hierarchical conditioned on our CI embedding, which we are the first to conduct.
Figure 5: Neuro-LLM. Our Neuro-LLM can understand various kinds of neural signals and describe the scene of visual stimuli. We also show reference RGB images for better demonstration.

NeuroBind: Towards Unified Multimodal Representations for Neural Signals

TL;DR

Abstract

NeuroBind: Towards Unified Multimodal Representations for Neural Signals

Authors

TL;DR

Abstract

Table of Contents

Figures (5)