Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings
Owais Mujtaba Khanday, Pablo Rodroguez San Esteban, Zubair Ahmad Lone, Marc Ouellet, Jose Andres Gonzalez Lopez
TL;DR
The paper tackles reconstructing neural activity during speech production by leveraging embeddings from large self-supervised language and speech models. It uses an ElasticNet mapping to predict high-gamma sEEG features from word- and audio-derived embeddings obtained from FastText, GPT-2.0, and Wav2Vec 2.0 XLS-R, evaluated with leave-one-out cross-validation. The results show strong reconstruction across participants, with $PCC$ and $R^2$ values reaching up to $0.99$, though performance varies with electrode coverage and subject, particularly for Wav2Vec 2.0. These findings indicate that linguistic and acoustic representations in pre-trained models align with neural processes underlying speech, informing future neural speech interfaces and neuroscience studies.
Abstract
Understanding how neural activity encodes speech and language production is a fundamental challenge in neuroscience and artificial intelligence. This study investigates whether embeddings from large-scale, self-supervised language and speech models can effectively reconstruct high-gamma neural activity characteristics, key indicators of cortical processing, recorded during speech production. We leverage pre-trained embeddings from deep learning models trained on linguistic and acoustic data to represent high-level speech features and map them onto these high-gamma signals. We analyze the extent to which these embeddings preserve the spatio-temporal dynamics of brain activity. Reconstructed neural signals are evaluated against high-gamma ground-truth activity using correlation metrics and signal reconstruction quality assessments. The results indicate that high-gamma activity can be effectively reconstructed using large language and speech model embeddings in all study participants, generating Pearson's correlation coefficients ranging from 0.79 to 0.99.
