Table of Contents
Fetching ...

Towards Decoding Brain Activity During Passive Listening of Speech

Milán András Fodor, Tamás Gábor Csapó, Frigyes Viktor Arthur

TL;DR

The study addresses decoding neural activity during passive speech perception by predicting heard speech from intracranial EEG (iEEG) using deep learning. It leverages the Open multimodal iEEG-fMRI dataset and compares two architectures, a Fully Connected DNN (Fc-DNN) and a 2D-CNN, to map iEEG signals to mel-spectrogram representations of heard speech, evaluating performance primarily with Mean Squared Error ($MSE$). Results show subject-dependent gains, with some reductions in training loss, but synthesized speech remains largely unintelligible, underscoring challenges in passive-listening decoding and data limitations. The work provides a foundational step toward speech-decoding BCIs, highlighting the need for larger, more diverse datasets, transformer-based temporal models, and multi-modal integration to improve accuracy, realism, and practical utility of neural speech synthesis.

Abstract

The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech. We attempt to decode heard speech from intracranial electroencephalographic (iEEG) data using deep learning methods. The goal is to aid the advancement of brain-computer interface (BCI) technology for speech synthesis, and, hopefully, to provide an additional perspective on the cognitive processes of speech perception. This approach diverges from the conventional focus on speech production and instead chooses to investigate neural representations of perceived speech. This angle opened up a complex perspective, potentially allowing us to study more sophisticated neural patterns. Leveraging the power of deep learning models, the research aimed to establish a connection between these intricate neural activities and the corresponding speech sounds. Despite the approach not having achieved a breakthrough yet, the research sheds light on the potential of decoding neural activity during speech perception. Our current efforts can serve as a foundation, and we are optimistic about the potential of expanding and improving upon this work to move closer towards more advanced BCIs, better understanding of processes underlying perceived speech and its relation to spoken speech.

Towards Decoding Brain Activity During Passive Listening of Speech

TL;DR

The study addresses decoding neural activity during passive speech perception by predicting heard speech from intracranial EEG (iEEG) using deep learning. It leverages the Open multimodal iEEG-fMRI dataset and compares two architectures, a Fully Connected DNN (Fc-DNN) and a 2D-CNN, to map iEEG signals to mel-spectrogram representations of heard speech, evaluating performance primarily with Mean Squared Error (). Results show subject-dependent gains, with some reductions in training loss, but synthesized speech remains largely unintelligible, underscoring challenges in passive-listening decoding and data limitations. The work provides a foundational step toward speech-decoding BCIs, highlighting the need for larger, more diverse datasets, transformer-based temporal models, and multi-modal integration to improve accuracy, realism, and practical utility of neural speech synthesis.

Abstract

The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech. We attempt to decode heard speech from intracranial electroencephalographic (iEEG) data using deep learning methods. The goal is to aid the advancement of brain-computer interface (BCI) technology for speech synthesis, and, hopefully, to provide an additional perspective on the cognitive processes of speech perception. This approach diverges from the conventional focus on speech production and instead chooses to investigate neural representations of perceived speech. This angle opened up a complex perspective, potentially allowing us to study more sophisticated neural patterns. Leveraging the power of deep learning models, the research aimed to establish a connection between these intricate neural activities and the corresponding speech sounds. Despite the approach not having achieved a breakthrough yet, the research sheds light on the potential of decoding neural activity during speech perception. Our current efforts can serve as a foundation, and we are optimistic about the potential of expanding and improving upon this work to move closer towards more advanced BCIs, better understanding of processes underlying perceived speech and its relation to spoken speech.
Paper Structure (34 sections, 7 figures, 2 tables)

This paper contains 34 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Broca's area, the motor cortex, the cerebellum, Wernicke's area, and the superior temporal gyrus, posterior superior temporal sulcus highlighted as important areas of the brain regarding speech. Figure based on guenther2006corticalhickok2007corticalvon2010humanHein2008 .
  • Figure 2: The four subjects with the highest correlation with the speech envelope. From Berezutskaya2022.
  • Figure 3: The electrode positions for Subject 38. Extracted from the Open multimodal iEEG-fMRI dataset Berezutskaya2022
  • Figure 4: Lagplots of the cross-correlation of the electrode’s high-frequency band signal and the sound envelope Berezutskaya2022.
  • Figure 5: Visual representation of the iEEG input and mel-spectrogram output of the DNN.
  • ...and 2 more figures