Visual Neural Decoding via Improved Visual-EEG Semantic Consistency
Hongzhou Chen, Lianghua He, Yihang Liu, Longzhen Yang
TL;DR
This paper tackles the problem of decoding visual experiences from EEG by addressing semantic misalignment that arises when EEG features are mapped directly into fixed multimodal embedding spaces. It introduces VE-SDN, a Visual-EEG Semantic Decouple Framework that projects both image and EEG features into a shared joint semantic space, and explicitly decouples semantic-related from domain-specific information using mutual information minimization, cross-modal mutual information maximization with InfoNCE, and cyclic reconstruction for inter-modality consistency. A neuroscience-inspired intra-class geometric consistency constraint enforces stable, class-consistent distances between visual samples and EEG prototypes, implemented via a memory-bank of EEG prototypes updated by EMA. Experiments on the ThingsEEG dataset show state-of-the-art zero-shot decoding performance, with higher mutual information between visual and EEG features correlating with better generalization, demonstrating the effectiveness of semantic alignment and robust cross-modal representations for neural decoding.
Abstract
Visual neural decoding refers to the process of extracting and interpreting original visual experiences from human brain activity. Recent advances in metric learning-based EEG visual decoding methods have delivered promising results and demonstrated the feasibility of decoding novel visual categories from brain activity. However, methods that directly map EEG features to the CLIP embedding space may introduce mapping bias and cause semantic inconsistency among features, thereby degrading alignment and impairing decoding performance. To further explore the semantic consistency between visual and neural signals. In this work, we construct a joint semantic space and propose a Visual-EEG Semantic Decouple Framework that explicitly extracts the semantic-related features of these two modalities to facilitate optimal alignment. Specifically, a cross-modal information decoupling module is introduced to guide the extraction of semantic-related information from modalities. Then, by quantifying the mutual information between visual image and EEG features, we observe a strong positive correlation between the decoding performance and the magnitude of mutual information. Furthermore, inspired by the mechanisms of visual object understanding from neuroscience, we propose an intra-class geometric consistency approach during the alignment process. This strategy maps visual samples within the same class to consistent neural patterns, which further enhances the robustness and the performance of EEG visual decoding. Experiments on a large Image-EEG dataset show that our method achieves state-of-the-art results in zero-shot neural decoding tasks.
