On Creating A Brain-To-Text Decoder
Zenon Lamprou, Yashar Moshfeghi
TL;DR
This work investigates decoding spoken language from neural activity using raw EEG and invasive IMA data to produce textual output. It builds a transformer-based brain-to-text pipeline, exploring LLM substitutions, CTC loss, and multiple encoder options (wav2vec2, Data2Vec, Bendr, EEG-Conformer) across ZuCo EEG and Willett 2023 IMA datasets. Across experiments, substituting LLMs or applying CTC gains yields limited improvements, with notable training instability and encoder-learning challenges, underscoring the central bottleneck of robust brain encoders. The study highlights the need for stronger encoder–decoder architectures and higher electrode density to achieve reliable, real-time EEG-to-text translations, guiding future work toward more capable neuro-symbolic decoding systems.
Abstract
Brain decoding has emerged as a rapidly advancing and extensively utilized technique within neuroscience. This paper centers on the application of raw electroencephalogram (EEG) signals for decoding human brain activity, offering a more expedited and efficient methodology for enhancing our understanding of the human brain. The investigation specifically scrutinizes the efficacy of brain-computer interfaces (BCI) in deciphering neural signals associated with speech production, with particular emphasis on the impact of vocabulary size, electrode density, and training data on the framework's performance. The study reveals the competitive word error rates (WERs) achievable on the Librispeech benchmark through pre-training on unlabelled data for speech processing. Furthermore, the study evaluates the efficacy of voice recognition under configurations with limited labeled data, surpassing previous state-of-the-art techniques while utilizing significantly fewer labels. Additionally, the research provides a comprehensive analysis of error patterns in voice recognition and the influence of model size and unlabelled training data. It underscores the significance of factors such as vocabulary size and electrode density in enhancing BCI performance, advocating for an increase in microelectrodes and refinement of language models.
