Table of Contents
Fetching ...

Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation

Terrance Yu-Hao Chen, Yulin Chen, Pontus Soederhaell, Sadrishya Agrawal, Kateryna Shapovalenko

TL;DR

This work tackles EEG-based speech decoding, addressing low SNR, inter-subject variability, and limited labeled data. It combines a variational autoencoder-based data augmentation strategy (AugVAE-EEG) with a cross-modal Transformer adapted from EMG-to-speech to perform EEG-to-text at the word and sentence levels, evaluated on the Brennan dataset. Results show that VAEs can generate plausible synthetic EEG but augmentation did not improve performance, while the Seq2Seq model captures sentence dynamics yet generalizes poorly across subjects; the Word Classifier is constrained by word-frequency priors. The study provides baselines and highlights gaps—data diversity, cross-subject normalization, and multimodal integration—that guide future large-scale pretraining and multimodal BCI development.

Abstract

Decoding speech from non-invasive brain signals, such as electroencephalography (EEG), has the potential to advance brain-computer interfaces (BCIs), with applications in silent communication and assistive technologies for individuals with speech impairments. However, EEG-based speech decoding faces major challenges, such as noisy data, limited datasets, and poor performance on complex tasks like speech perception. This study attempts to address these challenges by employing variational autoencoders (VAEs) for EEG data augmentation to improve data quality and applying a state-of-the-art (SOTA) sequence-to-sequence deep learning architecture, originally successful in electromyography (EMG) tasks, to EEG-based speech decoding. Additionally, we adapt this architecture for word classification tasks. Using the Brennan dataset, which contains EEG recordings of subjects listening to narrated speech, we preprocess the data and evaluate both classification and sequence-to-sequence models for EEG-to-words/sentences tasks. Our experiments show that VAEs have the potential to reconstruct artificial EEG data for augmentation. Meanwhile, our sequence-to-sequence model achieves more promising performance in generating sentences compared to our classification model, though both remain challenging tasks. These findings lay the groundwork for future research on EEG speech perception decoding, with possible extensions to speech production tasks such as silent or imagined speech.

Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation

TL;DR

This work tackles EEG-based speech decoding, addressing low SNR, inter-subject variability, and limited labeled data. It combines a variational autoencoder-based data augmentation strategy (AugVAE-EEG) with a cross-modal Transformer adapted from EMG-to-speech to perform EEG-to-text at the word and sentence levels, evaluated on the Brennan dataset. Results show that VAEs can generate plausible synthetic EEG but augmentation did not improve performance, while the Seq2Seq model captures sentence dynamics yet generalizes poorly across subjects; the Word Classifier is constrained by word-frequency priors. The study provides baselines and highlights gaps—data diversity, cross-subject normalization, and multimodal integration—that guide future large-scale pretraining and multimodal BCI development.

Abstract

Decoding speech from non-invasive brain signals, such as electroencephalography (EEG), has the potential to advance brain-computer interfaces (BCIs), with applications in silent communication and assistive technologies for individuals with speech impairments. However, EEG-based speech decoding faces major challenges, such as noisy data, limited datasets, and poor performance on complex tasks like speech perception. This study attempts to address these challenges by employing variational autoencoders (VAEs) for EEG data augmentation to improve data quality and applying a state-of-the-art (SOTA) sequence-to-sequence deep learning architecture, originally successful in electromyography (EMG) tasks, to EEG-based speech decoding. Additionally, we adapt this architecture for word classification tasks. Using the Brennan dataset, which contains EEG recordings of subjects listening to narrated speech, we preprocess the data and evaluate both classification and sequence-to-sequence models for EEG-to-words/sentences tasks. Our experiments show that VAEs have the potential to reconstruct artificial EEG data for augmentation. Meanwhile, our sequence-to-sequence model achieves more promising performance in generating sentences compared to our classification model, though both remain challenging tasks. These findings lay the groundwork for future research on EEG speech perception decoding, with possible extensions to speech production tasks such as silent or imagined speech.
Paper Structure (32 sections, 9 equations, 9 figures, 3 tables)

This paper contains 32 sections, 9 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Baseline model.
  • Figure 2: Approach overview.
  • Figure 3: EEG-to-Text models.
  • Figure 4: AugVAE-EEG model.
  • Figure 5: Comparison of real and generated EEG signals.
  • ...and 4 more figures