Decoding Phone Pairs from MEG Signals Across Speech Modalities
Xabier de Zuazo, Eva Navas, Ibon Saratxaga, Mathieu Bourguignon, Nicola Molinaro
TL;DR
The paper addresses decoding phonetic information from MEG during overt speech production and perception, contrasting a broad set of classifiers. It finds that production yields substantially higher decoding accuracy than perception, with Elastic Net outperforming neural networks in this limited-data regime. Low-frequency bands, especially Delta and Theta, carry the most phonetic information during production, while higher frequencies contribute minimally, highlighting the role of slow oscillations in speech motor processes. The work contributes a reproducible pipeline, a 15-phoneme pairing scheme, and insights for designing MEG-based speech BCIs, while noting possible residual artifacts and the need for larger datasets to generalize beyond the present sample.
Abstract
Understanding the neural mechanisms underlying speech production is essential for both advancing cognitive neuroscience theory and developing practical communication technologies. In this study, we investigated magnetoencephalography signals to decode phones from brain activity during speech production and perception (passive listening and voice playback) tasks. Using a dataset comprising 17 participants, we performed pairwise phone classification, extending our analysis to 15 phonetic pairs. Multiple machine learning approaches, including regularized linear models and neural network architectures, were compared to determine their effectiveness in decoding phonetic information. Our results demonstrate significantly higher decoding accuracy during speech production (76.6%) compared to passive listening and playback modalities (~51%), emphasizing the richer neural information available during overt speech. Among the models, the Elastic Net classifier consistently outperformed more complex neural networks, highlighting the effectiveness of traditional regularization techniques when applied to limited and high-dimensional MEG datasets. Besides, analysis of specific brain frequency bands revealed that low-frequency oscillations, particularly Delta (0.2-3 Hz) and Theta (4-7 Hz), contributed the most substantially to decoding accuracy, suggesting that these bands encode critical speech production-related neural processes. Despite using advanced denoising methods, it remains unclear whether decoding solely reflects neural activity or if residual muscular or movement artifacts also contributed, indicating the need for further methodological refinement. Overall, our findings underline the critical importance of examining overt speech production paradigms, which, despite their complexity, offer opportunities to improve brain-computer interfaces to help individuals with severe speech impairments.
