Table of Contents
Fetching ...

Visual Neural Decoding via Improved Visual-EEG Semantic Consistency

Hongzhou Chen, Lianghua He, Yihang Liu, Longzhen Yang

TL;DR

This paper tackles the problem of decoding visual experiences from EEG by addressing semantic misalignment that arises when EEG features are mapped directly into fixed multimodal embedding spaces. It introduces VE-SDN, a Visual-EEG Semantic Decouple Framework that projects both image and EEG features into a shared joint semantic space, and explicitly decouples semantic-related from domain-specific information using mutual information minimization, cross-modal mutual information maximization with InfoNCE, and cyclic reconstruction for inter-modality consistency. A neuroscience-inspired intra-class geometric consistency constraint enforces stable, class-consistent distances between visual samples and EEG prototypes, implemented via a memory-bank of EEG prototypes updated by EMA. Experiments on the ThingsEEG dataset show state-of-the-art zero-shot decoding performance, with higher mutual information between visual and EEG features correlating with better generalization, demonstrating the effectiveness of semantic alignment and robust cross-modal representations for neural decoding.

Abstract

Visual neural decoding refers to the process of extracting and interpreting original visual experiences from human brain activity. Recent advances in metric learning-based EEG visual decoding methods have delivered promising results and demonstrated the feasibility of decoding novel visual categories from brain activity. However, methods that directly map EEG features to the CLIP embedding space may introduce mapping bias and cause semantic inconsistency among features, thereby degrading alignment and impairing decoding performance. To further explore the semantic consistency between visual and neural signals. In this work, we construct a joint semantic space and propose a Visual-EEG Semantic Decouple Framework that explicitly extracts the semantic-related features of these two modalities to facilitate optimal alignment. Specifically, a cross-modal information decoupling module is introduced to guide the extraction of semantic-related information from modalities. Then, by quantifying the mutual information between visual image and EEG features, we observe a strong positive correlation between the decoding performance and the magnitude of mutual information. Furthermore, inspired by the mechanisms of visual object understanding from neuroscience, we propose an intra-class geometric consistency approach during the alignment process. This strategy maps visual samples within the same class to consistent neural patterns, which further enhances the robustness and the performance of EEG visual decoding. Experiments on a large Image-EEG dataset show that our method achieves state-of-the-art results in zero-shot neural decoding tasks.

Visual Neural Decoding via Improved Visual-EEG Semantic Consistency

TL;DR

This paper tackles the problem of decoding visual experiences from EEG by addressing semantic misalignment that arises when EEG features are mapped directly into fixed multimodal embedding spaces. It introduces VE-SDN, a Visual-EEG Semantic Decouple Framework that projects both image and EEG features into a shared joint semantic space, and explicitly decouples semantic-related from domain-specific information using mutual information minimization, cross-modal mutual information maximization with InfoNCE, and cyclic reconstruction for inter-modality consistency. A neuroscience-inspired intra-class geometric consistency constraint enforces stable, class-consistent distances between visual samples and EEG prototypes, implemented via a memory-bank of EEG prototypes updated by EMA. Experiments on the ThingsEEG dataset show state-of-the-art zero-shot decoding performance, with higher mutual information between visual and EEG features correlating with better generalization, demonstrating the effectiveness of semantic alignment and robust cross-modal representations for neural decoding.

Abstract

Visual neural decoding refers to the process of extracting and interpreting original visual experiences from human brain activity. Recent advances in metric learning-based EEG visual decoding methods have delivered promising results and demonstrated the feasibility of decoding novel visual categories from brain activity. However, methods that directly map EEG features to the CLIP embedding space may introduce mapping bias and cause semantic inconsistency among features, thereby degrading alignment and impairing decoding performance. To further explore the semantic consistency between visual and neural signals. In this work, we construct a joint semantic space and propose a Visual-EEG Semantic Decouple Framework that explicitly extracts the semantic-related features of these two modalities to facilitate optimal alignment. Specifically, a cross-modal information decoupling module is introduced to guide the extraction of semantic-related information from modalities. Then, by quantifying the mutual information between visual image and EEG features, we observe a strong positive correlation between the decoding performance and the magnitude of mutual information. Furthermore, inspired by the mechanisms of visual object understanding from neuroscience, we propose an intra-class geometric consistency approach during the alignment process. This strategy maps visual samples within the same class to consistent neural patterns, which further enhances the robustness and the performance of EEG visual decoding. Experiments on a large Image-EEG dataset show that our method achieves state-of-the-art results in zero-shot neural decoding tasks.
Paper Structure (19 sections, 16 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 16 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of UMAP mcinnes2018umap 2D visualization of training CLIP images, text, and EEG features aligned on a hypersphere with multimodal contrastive training. (a) and (b) demonstrate the direct mapping of EEG samples into the fixed pre-trained CLIP space along with the corresponding alignment of image and text features, highlighting the apparent modality gap between features. (c) Reproject CLIP image features and EEG embedding into joint semantic space, and visualize the aligned features afterward.
  • Figure 2: The overview of the proposed Visual-EEG Semantic Decouple Framework (VE-SDN). (a) The main pipeline for semantic decoupling of stimulus images and EEG signals. (b) The inference and decoding process of our proposed VE-SDN. (c) Intra-class geometric consistency constraint in the process of the alignment of semantic-related features.
  • Figure 3: 2D UMAP and t-SNE van2008visualizing visualization of four categories: animal, food, vehicle, and tool of subject 8 test visual image and EEG features. (a) Visualization of CLIP-Con two modality embeddings. (b) Visualization of Joint-Con two modality embeddings.
  • Figure 4: Illustration of the MI values (smoothed) and MSE scores for VE-SDN during training batch iterations. (a) The mutual information estimated by CLUB between the semantic-related and domain-related parts of image and EEG features. (b) The MSE loss score of image and EEG part cyclic reconstruction.
  • Figure 5: (a) Illustration of the MI values (smoothed) between semantic-related parts of images and EEG. (b) Accuracy under different coefficient settings of $\lambda_{2}.$
  • ...and 5 more figures