BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings
Dongyang Li, Haoyang Qin, Mingyang Wu, Chen Wei, Quanying Liu
TL;DR
BrainFLORA tackles the challenge of integrating heterogeneous neural signals to uncover brain-based visual concept representations. It introduces a multimodal encoding framework with modality-specific encoders, a soft-routing Mixture-of-Experts universal projection, and a three-stage training objective that blends high-level contrastive alignment with diffusion-based low-level reconstruction. Across EEG, MEG, and fMRI on THINGS-derived data, BrainFLORA achieves SOTA joint-subject retrieval, robust cross-modal reconstruction, and competitive captioning, while providing insights into the geometric organization of concept representations across modalities. The work advances cognitive-neuroscience and brain-computer interfaces by offering a scalable, unified model that reveals how brain activity across modalities maps to real-world object concepts, with potential for future expansion to larger datasets and grand unified architectures.
Abstract
Understanding how the brain represents visual information is a fundamental challenge in neuroscience and artificial intelligence. While AI-driven decoding of neural data has provided insights into the human visual system, integrating multimodal neuroimaging signals, such as EEG, MEG, and fMRI, remains a critical hurdle due to their inherent spatiotemporal misalignment. Current approaches often analyze these modalities in isolation, limiting a holistic view of neural representation. In this study, we introduce BrainFLORA, a unified framework for integrating cross-modal neuroimaging data to construct a shared neural representation. Our approach leverages multimodal large language models (MLLMs) augmented with modality-specific adapters and task decoders, achieving state-of-the-art performance in joint-subject visual retrieval task and has the potential to extend multitasking. Combining neuroimaging analysis methods, we further reveal how visual concept representations align across neural modalities and with real world object perception. We demonstrate that the brain's structured visual concept representations exhibit an implicit mapping to physical-world stimuli, bridging neuroscience and machine learning from different modalities of neural imaging. Beyond methodological advancements, BrainFLORA offers novel implications for cognitive neuroscience and brain-computer interfaces (BCIs). Our code is available at https://github.com/ncclab-sustech/BrainFLORA.
