RealMind: Advancing Visual Decoding and Language Interaction via EEG Signals
Dongyang Li, Haoyang Qin, Mingyang Wu, Jiahua Tang, Yuang Cao, Chen Wei, Quanying Liu
TL;DR
RealMind tackles the challenge of decoding visual experiences from EEG by learning multimodal-aligned representations with semantic and geometric consistency losses. It uses a Transformer-based EEG encoder to map multi-channel EEG to latent spaces aligned with CLIP and large-language models, enabling retrieval, reconstruction, and the first zero-shot EEG captioning. On the THINGS-EEG dataset, RealMind achieves Top-1 27.58% and Top-5 58.42% in 200-class zero-shot retrieval and BLEU-1 26.59% in 200-class zero-shot captioning, demonstrating strong multitask performance and cross-modal alignment. This work advances practical EEG-based visual decoding by enabling captioning and by providing a scalable, interpretable architecture for BCI applications.
Abstract
Decoding visual stimuli from neural recordings is a critical challenge in the development of brain-computer interfaces (BCIs). Although recent EEG-based decoding approaches have made progress in tasks such as visual classification, retrieval, and reconstruction, they remain constrained by unstable representation learning and a lack of interpretability. This gap highlights the need for more efficient representation learning and the integration of effective language interaction to enhance both understanding and practical usability in visual decoding tasks.To address this limitation, we introduce RealMind, a novel EEG-based framework designed to handle a diverse range of downstream tasks. Specifically, RealMind leverages both semantic and geometric consistency learning to enhance feature representation and improve alignment across tasks. Notably, beyond excelling in traditional tasks, our framework marks the first attempt at visual captioning from EEG data through vision-language model (VLM). It achieves a Top-1 decoding accuracy of 27.58% in a 200-class zero-shot retrieval task and a BLEU-1 score of 26.59% in a 200-class zero-shot captioning task. Overall, RealMind provides a comprehensive multitask EEG decoding framework, establishing a foundational approach for EEG-based visual decoding in real-world applications.
