Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding
Hamza Amrani, Daniela Micucci, Paolo Napoletano
TL;DR
This work tackles open vocabulary EEG-to-text decoding with non-invasive EEG by introducing an end-to-end architecture that combines a subject-dependent Brain module, a pre-trained BART language model, and a GPT-4 sentence refinement module. The approach is trained in two stages and incorporates a Learnable Features Module (including a Brain Transformer Encoder) to produce latent brain representations that align with language embeddings, complemented by a two-stage training objective $\mathcal{L}_{MSE}$ and $\mathcal{L}_{rec}$. Evaluation on ZuCo v1.0/v2.0 using BLEU, ROUGE, and BERTScore demonstrates state-of-the-art gains (e.g., BLEU-1 = $42.75\%$, ROUGE-1-F = $33.28\%$, BERTScore-F = $53.86\%$, with improvements of $3.38\%$, $8.43\%$, and $6.31\%$ respectively) and highlights the importance of subjectivity modeling and sentence-level semantic assessment. The work also provides ablations and embedding visualizations to dissect component contributions and discusses ethical considerations, suggesting a path toward more human-aligned open vocabulary brain decoding and future work on generalization and privacy safeguards.
Abstract
Previous research has demonstrated the potential of using pre-trained language models for decoding open vocabulary Electroencephalography (EEG) signals captured through a non-invasive Brain-Computer Interface (BCI). However, the impact of embedding EEG signals in the context of language models and the effect of subjectivity, remain unexplored, leading to uncertainty about the best approach to enhance decoding performance. Additionally, current evaluation metrics used to assess decoding effectiveness are predominantly syntactic and do not provide insights into the comprehensibility of the decoded output for human understanding. We present an end-to-end deep learning framework for non-invasive brain recordings that brings modern representational learning approaches to neuroscience. Our proposal introduces the following innovations: 1) an end-to-end deep learning architecture for open vocabulary EEG decoding, incorporating a subject-dependent representation learning module for raw EEG encoding, a BART language model, and a GPT-4 sentence refinement module; 2) a more comprehensive sentence-level evaluation metric based on the BERTScore; 3) an ablation study that analyses the contributions of each module within our proposal, providing valuable insights for future research. We evaluate our approach on two publicly available datasets, ZuCo v1.0 and v2.0, comprising EEG recordings of 30 subjects engaged in natural reading tasks. Our model achieves a BLEU-1 score of 42.75%, a ROUGE-1-F of 33.28%, and a BERTScore-F of 53.86%, outperforming the previous state-of-the-art methods by 3.38%, 8.43%, and 6.31%, respectively.
