EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer

Hanwen Liu; Daniel Hajialigol; Benny Antony; Aiguo Han; Xuan Wang

EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer

Hanwen Liu, Daniel Hajialigol, Benny Antony, Aiguo Han, Xuan Wang

TL;DR

EEG2Text tackles open-vocabulary brain-to-text decoding from noninvasive EEG by introducing a sentence-level encoding pipeline with a convolutional transformer, self-supervised pre-training, and a multi-view transformer that exploits spatial brain-region information. It demonstrates consistent gains over state-of-the-art open-vocabulary baselines on Zuco and Image-EEG datasets, with up to 5% absolute improvements in BLEU and ROUGE metrics. The approach reduces reliance on eye-tracking calibration and offers better generalization to inner speech decoding, moving toward high-performance, open-vocabulary brain-to-text communication. The authors also explore cross-modal pre-training with image-EEG data and discuss releasing code and data to support future research.

Abstract

Deciphering the intricacies of the human brain has captivated curiosity for centuries. Recent strides in Brain-Computer Interface (BCI) technology, particularly using motor imagery, have restored motor functions such as reaching, grasping, and walking in paralyzed individuals. However, unraveling natural language from brain signals remains a formidable challenge. Electroencephalography (EEG) is a non-invasive technique used to record electrical activity in the brain by placing electrodes on the scalp. Previous studies of EEG-to-text decoding have achieved high accuracy on small closed vocabularies, but still fall short of high accuracy when dealing with large open vocabularies. We propose a novel method, EEG2TEXT, to improve the accuracy of open vocabulary EEG-to-text decoding. Specifically, EEG2TEXT leverages EEG pre-training to enhance the learning of semantics from EEG signals and proposes a multi-view transformer to model the EEG signal processing by different spatial regions of the brain. Experiments show that EEG2TEXT has superior performance, outperforming the state-of-the-art baseline methods by a large margin of up to 5% in absolute BLEU and ROUGE scores. EEG2TEXT shows great potential for a high-performance open-vocabulary brain-to-text system to facilitate communication.

EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer

TL;DR

Abstract

Paper Structure (27 sections, 4 equations, 4 figures, 8 tables)

This paper contains 27 sections, 4 equations, 4 figures, 8 tables.

Introduction
Task Definition
Methodology
Baseline Model
Convolutional Transformer for Sentence-Level EEG Encoding
Transformer Pre-Training for an Enhanced EEG Encoding
Multi-View Transformer for Different Spatial Regions of the Brain
Experiment
Experimental Setup
Dataset
Baselines
Evaluation Metrics
Parameter Study
Results
Main Results
...and 12 more sections

Figures (4)

Figure 1: The overall framework of open-vocabulary EEG-to-text translation. The first sub-figure comes from nagel2018modelling.
Figure 2: The overall framework of EEG2Text. It takes the sentence EEG signals as input and decodes the original text as output. EEG2Text includes major steps of 1) a base convolutional transformer model, 2) pre-training for EEG encoding, and 3) a multi-view transformer for different spatial regions of the brain.
Figure A1: a piece of EEG signals and its corresponding Spectrogram
Figure A2: Zero-Shot Image-to-Text Translation.

EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer

TL;DR

Abstract

EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (4)