Table of Contents
Fetching ...

NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity

Owais Mujtaba Khanday, José L. Pérez-Córdoba, Mohd Yaqub Mir, Ashfaq Ahmad Najar, Jose A. Gonzalez-Lopez

TL;DR

This work tackles neural speech decoding from invasive sEEG by leveraging high-gamma features in the $70$–$170$ Hz band and a novel NeuroIncept Decoder that fuses multi-scale Inception-based feature extraction with GRU temporal modeling to map neural activity to log-Mel spectrograms. Evaluated on data from 10 participants performing word-reading tasks, the approach achieves strong correlations between predicted and original spectrograms, with PCC values reaching up to $0.93$ and STGI up to $0.55$, though performance varies with electrode coverage in language regions. Compared to baseline models (LR, FCN, CNN), NeuroIncept shows superior PCC and STGI, highlighting the benefit of combining multiscale spatial-temporal features with recurrent dynamics for neural speech decoding. The results suggest significant potential for neural-decoding-based speech restoration in brain-computer interfaces, while also identifying avenues for improvement through pretraining on EEG-word data and waveform reconstruction via vocoders for end-to-end audible speech generation.

Abstract

This paper introduces a novel algorithm designed for speech synthesis from neural activity recordings obtained using invasive electroencephalography (EEG) techniques. The proposed system offers a promising communication solution for individuals with severe speech impairments. Central to our approach is the integration of time-frequency features in the high-gamma band computed from EEG recordings with an advanced NeuroIncept Decoder architecture. This neural network architecture combines Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) to reconstruct audio spectrograms from neural patterns. Our model demonstrates robust mean correlation coefficients between predicted and actual spectrograms, though inter-subject variability indicates distinct neural processing mechanisms among participants. Overall, our study highlights the potential of neural decoding techniques to restore communicative abilities in individuals with speech disorders and paves the way for future advancements in brain-computer interface technologies.

NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity

TL;DR

This work tackles neural speech decoding from invasive sEEG by leveraging high-gamma features in the Hz band and a novel NeuroIncept Decoder that fuses multi-scale Inception-based feature extraction with GRU temporal modeling to map neural activity to log-Mel spectrograms. Evaluated on data from 10 participants performing word-reading tasks, the approach achieves strong correlations between predicted and original spectrograms, with PCC values reaching up to and STGI up to , though performance varies with electrode coverage in language regions. Compared to baseline models (LR, FCN, CNN), NeuroIncept shows superior PCC and STGI, highlighting the benefit of combining multiscale spatial-temporal features with recurrent dynamics for neural speech decoding. The results suggest significant potential for neural-decoding-based speech restoration in brain-computer interfaces, while also identifying avenues for improvement through pretraining on EEG-word data and waveform reconstruction via vocoders for end-to-end audible speech generation.

Abstract

This paper introduces a novel algorithm designed for speech synthesis from neural activity recordings obtained using invasive electroencephalography (EEG) techniques. The proposed system offers a promising communication solution for individuals with severe speech impairments. Central to our approach is the integration of time-frequency features in the high-gamma band computed from EEG recordings with an advanced NeuroIncept Decoder architecture. This neural network architecture combines Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) to reconstruct audio spectrograms from neural patterns. Our model demonstrates robust mean correlation coefficients between predicted and actual spectrograms, though inter-subject variability indicates distinct neural processing mechanisms among participants. Overall, our study highlights the potential of neural decoding techniques to restore communicative abilities in individuals with speech disorders and paves the way for future advancements in brain-computer interface technologies.
Paper Structure (10 sections, 5 figures, 2 tables)

This paper contains 10 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Preprocessing pipeline for the sEEG and audio signals.
  • Figure 2: NeuroIncept Decoder model architecture.
  • Figure 3: Pearson correlation between predicted and original spectrograms.
  • Figure 4: STGI results between predicted and original spectrograms.
  • Figure 5: Examples of logMel spectrograms for: (a) natural speech recorded by the participants, (b) speech generated by the proposed NeuroIncept model, (c) the CNN model, and (d) the FCN model. The words are same for all the plots