Table of Contents
Fetching ...

BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings

Dongyang Li, Haoyang Qin, Mingyang Wu, Chen Wei, Quanying Liu

TL;DR

BrainFLORA tackles the challenge of integrating heterogeneous neural signals to uncover brain-based visual concept representations. It introduces a multimodal encoding framework with modality-specific encoders, a soft-routing Mixture-of-Experts universal projection, and a three-stage training objective that blends high-level contrastive alignment with diffusion-based low-level reconstruction. Across EEG, MEG, and fMRI on THINGS-derived data, BrainFLORA achieves SOTA joint-subject retrieval, robust cross-modal reconstruction, and competitive captioning, while providing insights into the geometric organization of concept representations across modalities. The work advances cognitive-neuroscience and brain-computer interfaces by offering a scalable, unified model that reveals how brain activity across modalities maps to real-world object concepts, with potential for future expansion to larger datasets and grand unified architectures.

Abstract

Understanding how the brain represents visual information is a fundamental challenge in neuroscience and artificial intelligence. While AI-driven decoding of neural data has provided insights into the human visual system, integrating multimodal neuroimaging signals, such as EEG, MEG, and fMRI, remains a critical hurdle due to their inherent spatiotemporal misalignment. Current approaches often analyze these modalities in isolation, limiting a holistic view of neural representation. In this study, we introduce BrainFLORA, a unified framework for integrating cross-modal neuroimaging data to construct a shared neural representation. Our approach leverages multimodal large language models (MLLMs) augmented with modality-specific adapters and task decoders, achieving state-of-the-art performance in joint-subject visual retrieval task and has the potential to extend multitasking. Combining neuroimaging analysis methods, we further reveal how visual concept representations align across neural modalities and with real world object perception. We demonstrate that the brain's structured visual concept representations exhibit an implicit mapping to physical-world stimuli, bridging neuroscience and machine learning from different modalities of neural imaging. Beyond methodological advancements, BrainFLORA offers novel implications for cognitive neuroscience and brain-computer interfaces (BCIs). Our code is available at https://github.com/ncclab-sustech/BrainFLORA.

BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings

TL;DR

BrainFLORA tackles the challenge of integrating heterogeneous neural signals to uncover brain-based visual concept representations. It introduces a multimodal encoding framework with modality-specific encoders, a soft-routing Mixture-of-Experts universal projection, and a three-stage training objective that blends high-level contrastive alignment with diffusion-based low-level reconstruction. Across EEG, MEG, and fMRI on THINGS-derived data, BrainFLORA achieves SOTA joint-subject retrieval, robust cross-modal reconstruction, and competitive captioning, while providing insights into the geometric organization of concept representations across modalities. The work advances cognitive-neuroscience and brain-computer interfaces by offering a scalable, unified model that reveals how brain activity across modalities maps to real-world object concepts, with potential for future expansion to larger datasets and grand unified architectures.

Abstract

Understanding how the brain represents visual information is a fundamental challenge in neuroscience and artificial intelligence. While AI-driven decoding of neural data has provided insights into the human visual system, integrating multimodal neuroimaging signals, such as EEG, MEG, and fMRI, remains a critical hurdle due to their inherent spatiotemporal misalignment. Current approaches often analyze these modalities in isolation, limiting a holistic view of neural representation. In this study, we introduce BrainFLORA, a unified framework for integrating cross-modal neuroimaging data to construct a shared neural representation. Our approach leverages multimodal large language models (MLLMs) augmented with modality-specific adapters and task decoders, achieving state-of-the-art performance in joint-subject visual retrieval task and has the potential to extend multitasking. Combining neuroimaging analysis methods, we further reveal how visual concept representations align across neural modalities and with real world object perception. We demonstrate that the brain's structured visual concept representations exhibit an implicit mapping to physical-world stimuli, bridging neuroscience and machine learning from different modalities of neural imaging. Beyond methodological advancements, BrainFLORA offers novel implications for cognitive neuroscience and brain-computer interfaces (BCIs). Our code is available at https://github.com/ncclab-sustech/BrainFLORA.

Paper Structure

This paper contains 43 sections, 14 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: Overview of BrainFLORA and the comparison between concept spaces of EEG, MEG, and fMRI. The framework links conceptual Representation Similarity Matrices (RSMs) that capture visual relationships among objects to neural responses elicited by diverse object images from the THINGS database.
  • Figure 2: Overall Framework of BrainFLORA. BrainFLORA comprises neural modality encoders, a universal projection, and a Mixture of Experts (MoE) projection module, with separate outputs for each modality. Left: The modality-specific encoders transform the input signal into semantic tokens. Middle: The MoE projection module projects and aligns diverse neural modalities into a unified semantic representation space. Right: Various task heads facilitate different downstream tasks such as retrieval, captioning, and reconstruction. All modules are jointly trained during the same stage, optimizing computational efficiency.
  • Figure 3: Architecture of the neural feature extraction module. The original neural sequences from multiple variates are simultaneously embedded into tokens. Multi-granularity attention is applied to the embedded tokens of correlated variables, enhancing electrode-level correlations. The representations for each token are then assigned through the router layers. Subsequently, Temporal-Spatial convolution is employed to mitigate overfitting while improving the model’s capacity for temporal-spatial representation learning.
  • Figure 4: Similarity analysis of the concept representation between EEG, MEG and fMRI across all subjects. a-c: We calculate the representation similarity matrix (RSM) between samples. RSMs for the test set consisting of all objects, created based in the projected BrainFLORA embedding (left). Correlation between the predicted and measured similarity on all object pairs (middle). The distribution of predicted and measured concept embedding are shown (right).
  • Figure 5: The forward and backward retrieval performance between EEG, MEG and fMRI.
  • ...and 9 more figures