Towards Decoding Brain Activity During Passive Listening of Speech
Milán András Fodor, Tamás Gábor Csapó, Frigyes Viktor Arthur
TL;DR
The study addresses decoding neural activity during passive speech perception by predicting heard speech from intracranial EEG (iEEG) using deep learning. It leverages the Open multimodal iEEG-fMRI dataset and compares two architectures, a Fully Connected DNN (Fc-DNN) and a 2D-CNN, to map iEEG signals to mel-spectrogram representations of heard speech, evaluating performance primarily with Mean Squared Error ($MSE$). Results show subject-dependent gains, with some reductions in training loss, but synthesized speech remains largely unintelligible, underscoring challenges in passive-listening decoding and data limitations. The work provides a foundational step toward speech-decoding BCIs, highlighting the need for larger, more diverse datasets, transformer-based temporal models, and multi-modal integration to improve accuracy, realism, and practical utility of neural speech synthesis.
Abstract
The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech. We attempt to decode heard speech from intracranial electroencephalographic (iEEG) data using deep learning methods. The goal is to aid the advancement of brain-computer interface (BCI) technology for speech synthesis, and, hopefully, to provide an additional perspective on the cognitive processes of speech perception. This approach diverges from the conventional focus on speech production and instead chooses to investigate neural representations of perceived speech. This angle opened up a complex perspective, potentially allowing us to study more sophisticated neural patterns. Leveraging the power of deep learning models, the research aimed to establish a connection between these intricate neural activities and the corresponding speech sounds. Despite the approach not having achieved a breakthrough yet, the research sheds light on the potential of decoding neural activity during speech perception. Our current efforts can serve as a foundation, and we are optimistic about the potential of expanding and improving upon this work to move closer towards more advanced BCIs, better understanding of processes underlying perceived speech and its relation to spoken speech.
