NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity
Owais Mujtaba Khanday, José L. Pérez-Córdoba, Mohd Yaqub Mir, Ashfaq Ahmad Najar, Jose A. Gonzalez-Lopez
TL;DR
This work tackles neural speech decoding from invasive sEEG by leveraging high-gamma features in the $70$–$170$ Hz band and a novel NeuroIncept Decoder that fuses multi-scale Inception-based feature extraction with GRU temporal modeling to map neural activity to log-Mel spectrograms. Evaluated on data from 10 participants performing word-reading tasks, the approach achieves strong correlations between predicted and original spectrograms, with PCC values reaching up to $0.93$ and STGI up to $0.55$, though performance varies with electrode coverage in language regions. Compared to baseline models (LR, FCN, CNN), NeuroIncept shows superior PCC and STGI, highlighting the benefit of combining multiscale spatial-temporal features with recurrent dynamics for neural speech decoding. The results suggest significant potential for neural-decoding-based speech restoration in brain-computer interfaces, while also identifying avenues for improvement through pretraining on EEG-word data and waveform reconstruction via vocoders for end-to-end audible speech generation.
Abstract
This paper introduces a novel algorithm designed for speech synthesis from neural activity recordings obtained using invasive electroencephalography (EEG) techniques. The proposed system offers a promising communication solution for individuals with severe speech impairments. Central to our approach is the integration of time-frequency features in the high-gamma band computed from EEG recordings with an advanced NeuroIncept Decoder architecture. This neural network architecture combines Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) to reconstruct audio spectrograms from neural patterns. Our model demonstrates robust mean correlation coefficients between predicted and actual spectrograms, though inter-subject variability indicates distinct neural processing mechanisms among participants. Overall, our study highlights the potential of neural decoding techniques to restore communicative abilities in individuals with speech disorders and paves the way for future advancements in brain-computer interface technologies.
