SACM: SEEG-Audio Contrastive Matching for Chinese Speech Decoding
Hongbin Wang, Zhihong Jia, Yuanzhong Shen, Ziwei Wang, Siyang Li, Kai Shu, Feng Hu, Dongrui Wu
TL;DR
This work introduces SACM, a CLIP-guided cross-modal contrastive framework that pairs SEEG signals with synchronized audio to decode Mandarin Chinese speech. Using the HUST-MIND dataset collected from eight patients, SACM demonstrates significant gains in both speech detection and word decoding accuracy, with sensorimotor cortex electrodes often achieving performance comparable to full electrode arrays. The study highlights the distributed nature of speech processing and the potential of multimodal BCIs, while noting limitations such as electrode coverage and limited sessions. Future directions include integrating additional modalities, targeted electrode placement for speech regions, and cross-subject pretraining to enhance generalization for online speech decoding BCIs.
Abstract
Speech disorders such as dysarthria and anarthria can severely impair the patient's ability to communicate verbally. Speech decoding brain-computer interfaces (BCIs) offer a potential alternative by directly translating speech intentions into spoken words, serving as speech neuroprostheses. This paper reports an experimental protocol for Mandarin Chinese speech decoding BCIs, along with the corresponding decoding algorithms. Stereo-electroencephalography (SEEG) and synchronized audio data were collected from eight drug-resistant epilepsy patients as they conducted a word-level reading task. The proposed SEEG and Audio Contrastive Matching (SACM), a contrastive learning-based framework, achieved decoding accuracies significantly exceeding chance levels in both speech detection and speech decoding tasks. Electrode-wise analysis revealed that a single sensorimotor cortex electrode achieved performance comparable to that of the full electrode array. These findings provide valuable insights for developing more accurate online speech decoding BCIs.
