Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding

Hamza Amrani; Daniela Micucci; Paolo Napoletano

Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding

Hamza Amrani, Daniela Micucci, Paolo Napoletano

TL;DR

This work tackles open vocabulary EEG-to-text decoding with non-invasive EEG by introducing an end-to-end architecture that combines a subject-dependent Brain module, a pre-trained BART language model, and a GPT-4 sentence refinement module. The approach is trained in two stages and incorporates a Learnable Features Module (including a Brain Transformer Encoder) to produce latent brain representations that align with language embeddings, complemented by a two-stage training objective $\mathcal{L}_{MSE}$ and $\mathcal{L}_{rec}$. Evaluation on ZuCo v1.0/v2.0 using BLEU, ROUGE, and BERTScore demonstrates state-of-the-art gains (e.g., BLEU-1 = $42.75\%$, ROUGE-1-F = $33.28\%$, BERTScore-F = $53.86\%$, with improvements of $3.38\%$, $8.43\%$, and $6.31\%$ respectively) and highlights the importance of subjectivity modeling and sentence-level semantic assessment. The work also provides ablations and embedding visualizations to dissect component contributions and discusses ethical considerations, suggesting a path toward more human-aligned open vocabulary brain decoding and future work on generalization and privacy safeguards.

Abstract

Previous research has demonstrated the potential of using pre-trained language models for decoding open vocabulary Electroencephalography (EEG) signals captured through a non-invasive Brain-Computer Interface (BCI). However, the impact of embedding EEG signals in the context of language models and the effect of subjectivity, remain unexplored, leading to uncertainty about the best approach to enhance decoding performance. Additionally, current evaluation metrics used to assess decoding effectiveness are predominantly syntactic and do not provide insights into the comprehensibility of the decoded output for human understanding. We present an end-to-end deep learning framework for non-invasive brain recordings that brings modern representational learning approaches to neuroscience. Our proposal introduces the following innovations: 1) an end-to-end deep learning architecture for open vocabulary EEG decoding, incorporating a subject-dependent representation learning module for raw EEG encoding, a BART language model, and a GPT-4 sentence refinement module; 2) a more comprehensive sentence-level evaluation metric based on the BERTScore; 3) an ablation study that analyses the contributions of each module within our proposal, providing valuable insights for future research. We evaluate our approach on two publicly available datasets, ZuCo v1.0 and v2.0, comprising EEG recordings of 30 subjects engaged in natural reading tasks. Our model achieves a BLEU-1 score of 42.75%, a ROUGE-1-F of 33.28%, and a BERTScore-F of 53.86%, outperforming the previous state-of-the-art methods by 3.38%, 8.43%, and 6.31%, respectively.

Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding

TL;DR

and

. Evaluation on ZuCo v1.0/v2.0 using BLEU, ROUGE, and BERTScore demonstrates state-of-the-art gains (e.g., BLEU-1 =

, ROUGE-1-F =

, BERTScore-F =

, with improvements of

, and

respectively) and highlights the importance of subjectivity modeling and sentence-level semantic assessment. The work also provides ablations and embedding visualizations to dissect component contributions and discusses ethical considerations, suggesting a path toward more human-aligned open vocabulary brain decoding and future work on generalization and privacy safeguards.

Abstract

Paper Structure (26 sections, 3 equations, 5 figures, 5 tables)

This paper contains 26 sections, 3 equations, 5 figures, 5 tables.

Introduction
Related Work
Method
Open Vocabulary EEG-to-Text Decoding
Proposed Architecture
Training Stage 1
Training Stage 2
Learnable Features Module
Sentence Refinement during Inference
Experiments
Data
Training Details
Architecture Details
Optimization Settings
Evaluation
...and 11 more sections

Figures (5)

Figure 1: The workflow of the proposed method involves several steps. Firstly, the raw EEG signals corresponding to each word are input into the Brain module. This module extracts subject-dependent features, which are subsequently utilized by a Language Module based on BART suitable trained for sentence generation. The resulting sentence is further refined using GPT-4 APIs to produce the final output. In the example, the ground truth is: He is a prominent member of the Bush family, the younger brother of President George W. Bush; the final sentence predicted by our model is: He was a member of the American Bush family, brother of President George W. Bush. Bold font refers to the exact match between the ground truth and the estimated sentence.
Figure 2: Overview of the proposed end-to-end architecture for open vocabulary EEG-to-Text decoding. Firstly, a sequence of word-level raw EEG signals is fed to the Brain module to extract deep-embedded representations for raw EEG encoding. Then, we use a Language Modeling (LM) module to generate EEG-to-Text sentences by leveraging the pre-trained language model BART.
Figure 3: The Learnable features module consists of (1) a learnable EEG feature block, (2) a subject layer to leverage inter-subject variability, (3) a multi-layer transformer (Brain Transformer Encoder), and (4) an MLP.
Figure 4: t-SNE visualization of EEG embedded representations of sentences in the training set, which are (a) original EEG representations and (b) generated by the Brain module of our architecture. Distinct colors mean different subjects. Each dot represents a sentence. The red triangle represents the EEG embedded representations corresponding to the same sentence "With his interest in race cars, he formed a second company, the Henry Ford Company".
Figure 5: End-to-end architecture for open vocabulary EEG-to-Text decoding.

Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding

TL;DR

Abstract

Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (5)