Reconstructing Unseen Sentences from Speech-related Biosignals for Open-vocabulary Neural Communication
Deok-Seon Kim, Seo-Hyun Lee, Kang Yin, Seong-Whan Lee
TL;DR
This work advances open-vocabulary neural communication by reconstructing unconstrained sentences from non-invasive biosignals, primarily high-density EEG with optional EMG. The authors introduce a subject-specific framework that outputs sentence-level MFCCs and phoneme sequences, leveraging a multimodal input, a ConvBlock–Bi-GRU architecture, and a HiFi-GAN vocoder with DeepSpeech for evaluation. They show that combining EEG and EMG significantly improves phoneme decoding and speech intelligibility for unseen sentences, with notable gains in overt and whispered speech and meaningful, albeit lower, performance for imagined speech. Neurophysiological analyses reveal frequency- and region-specific patterns across speech modes, highlighting delta rhythms as a temporal scaffold for speech, frontal involvement in imagined speech, and sustained temporal activation across modalities. These findings pave the way for adaptive, non-invasive BTS systems capable of supporting open-vocabulary communication and rehabilitation across diverse patient needs, while pointing to future work in robust imagined-speech decoding and larger, more varied datasets.
Abstract
Brain-to-speech (BTS) systems represent a groundbreaking approach to human communication by enabling the direct transformation of neural activity into linguistic expressions. While recent non-invasive BTS studies have largely focused on decoding predefined words or sentences, achieving open-vocabulary neural communication comparable to natural human interaction requires decoding unconstrained speech. Additionally, effectively integrating diverse signals derived from speech is crucial for developing personalized and adaptive neural communication and rehabilitation solutions for patients. This study investigates the potential of speech synthesis for previously unseen sentences across various speech modes by leveraging phoneme-level information extracted from high-density electroencephalography (EEG) signals, both independently and in conjunction with electromyography (EMG) signals. Furthermore, we examine the properties affecting phoneme decoding accuracy during sentence reconstruction and offer neurophysiological insights to further enhance EEG decoding for more effective neural communication solutions. Our findings underscore the feasibility of biosignal-based sentence-level speech synthesis for reconstructing unseen sentences, highlighting a significant step toward developing open-vocabulary neural communication systems adapted to diverse patient needs and conditions. Additionally, this study provides meaningful insights into the development of communication and rehabilitation solutions utilizing EEG-based decoding technologies.
