Table of Contents
Fetching ...

On Creating A Brain-To-Text Decoder

Zenon Lamprou, Yashar Moshfeghi

TL;DR

This work investigates decoding spoken language from neural activity using raw EEG and invasive IMA data to produce textual output. It builds a transformer-based brain-to-text pipeline, exploring LLM substitutions, CTC loss, and multiple encoder options (wav2vec2, Data2Vec, Bendr, EEG-Conformer) across ZuCo EEG and Willett 2023 IMA datasets. Across experiments, substituting LLMs or applying CTC gains yields limited improvements, with notable training instability and encoder-learning challenges, underscoring the central bottleneck of robust brain encoders. The study highlights the need for stronger encoder–decoder architectures and higher electrode density to achieve reliable, real-time EEG-to-text translations, guiding future work toward more capable neuro-symbolic decoding systems.

Abstract

Brain decoding has emerged as a rapidly advancing and extensively utilized technique within neuroscience. This paper centers on the application of raw electroencephalogram (EEG) signals for decoding human brain activity, offering a more expedited and efficient methodology for enhancing our understanding of the human brain. The investigation specifically scrutinizes the efficacy of brain-computer interfaces (BCI) in deciphering neural signals associated with speech production, with particular emphasis on the impact of vocabulary size, electrode density, and training data on the framework's performance. The study reveals the competitive word error rates (WERs) achievable on the Librispeech benchmark through pre-training on unlabelled data for speech processing. Furthermore, the study evaluates the efficacy of voice recognition under configurations with limited labeled data, surpassing previous state-of-the-art techniques while utilizing significantly fewer labels. Additionally, the research provides a comprehensive analysis of error patterns in voice recognition and the influence of model size and unlabelled training data. It underscores the significance of factors such as vocabulary size and electrode density in enhancing BCI performance, advocating for an increase in microelectrodes and refinement of language models.

On Creating A Brain-To-Text Decoder

TL;DR

This work investigates decoding spoken language from neural activity using raw EEG and invasive IMA data to produce textual output. It builds a transformer-based brain-to-text pipeline, exploring LLM substitutions, CTC loss, and multiple encoder options (wav2vec2, Data2Vec, Bendr, EEG-Conformer) across ZuCo EEG and Willett 2023 IMA datasets. Across experiments, substituting LLMs or applying CTC gains yields limited improvements, with notable training instability and encoder-learning challenges, underscoring the central bottleneck of robust brain encoders. The study highlights the need for stronger encoder–decoder architectures and higher electrode density to achieve reliable, real-time EEG-to-text translations, guiding future work toward more capable neuro-symbolic decoding systems.

Abstract

Brain decoding has emerged as a rapidly advancing and extensively utilized technique within neuroscience. This paper centers on the application of raw electroencephalogram (EEG) signals for decoding human brain activity, offering a more expedited and efficient methodology for enhancing our understanding of the human brain. The investigation specifically scrutinizes the efficacy of brain-computer interfaces (BCI) in deciphering neural signals associated with speech production, with particular emphasis on the impact of vocabulary size, electrode density, and training data on the framework's performance. The study reveals the competitive word error rates (WERs) achievable on the Librispeech benchmark through pre-training on unlabelled data for speech processing. Furthermore, the study evaluates the efficacy of voice recognition under configurations with limited labeled data, surpassing previous state-of-the-art techniques while utilizing significantly fewer labels. Additionally, the research provides a comprehensive analysis of error patterns in voice recognition and the influence of model size and unlabelled training data. It underscores the significance of factors such as vocabulary size and electrode density in enhancing BCI performance, advocating for an increase in microelectrodes and refinement of language models.
Paper Structure (14 sections, 5 figures, 2 tables)

This paper contains 14 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: This figure shows the enhancement with the introduction of several state-of-the art LLMs.
  • Figure 2: This figure illustrates how CTC loss was integrated to the pipeline with the hope of learning positional alignment of characters and brain data. For each time step the log probabilities of each character in the vocabulary were calculated and then the CTC loss was calculated between the predicted and actual sentence.
  • Figure 3: This figure shows the architecture of a Conformer Neural Network as proposed by Gulati et Al. Gulati2020 and how Convolution Layer can be integrrated with a Multi Headed Attention Layer.
  • Figure 4: This Figure shows the proposed architecture for a Wave2Vec2 model training regime as proposed by Baevski2020
  • Figure 5: This figure illustrates the proposed architecture for training effectively a Data2Vec model as proposed by Baevski2022