Table of Contents
Fetching ...

SACM: SEEG-Audio Contrastive Matching for Chinese Speech Decoding

Hongbin Wang, Zhihong Jia, Yuanzhong Shen, Ziwei Wang, Siyang Li, Kai Shu, Feng Hu, Dongrui Wu

TL;DR

This work introduces SACM, a CLIP-guided cross-modal contrastive framework that pairs SEEG signals with synchronized audio to decode Mandarin Chinese speech. Using the HUST-MIND dataset collected from eight patients, SACM demonstrates significant gains in both speech detection and word decoding accuracy, with sensorimotor cortex electrodes often achieving performance comparable to full electrode arrays. The study highlights the distributed nature of speech processing and the potential of multimodal BCIs, while noting limitations such as electrode coverage and limited sessions. Future directions include integrating additional modalities, targeted electrode placement for speech regions, and cross-subject pretraining to enhance generalization for online speech decoding BCIs.

Abstract

Speech disorders such as dysarthria and anarthria can severely impair the patient's ability to communicate verbally. Speech decoding brain-computer interfaces (BCIs) offer a potential alternative by directly translating speech intentions into spoken words, serving as speech neuroprostheses. This paper reports an experimental protocol for Mandarin Chinese speech decoding BCIs, along with the corresponding decoding algorithms. Stereo-electroencephalography (SEEG) and synchronized audio data were collected from eight drug-resistant epilepsy patients as they conducted a word-level reading task. The proposed SEEG and Audio Contrastive Matching (SACM), a contrastive learning-based framework, achieved decoding accuracies significantly exceeding chance levels in both speech detection and speech decoding tasks. Electrode-wise analysis revealed that a single sensorimotor cortex electrode achieved performance comparable to that of the full electrode array. These findings provide valuable insights for developing more accurate online speech decoding BCIs.

SACM: SEEG-Audio Contrastive Matching for Chinese Speech Decoding

TL;DR

This work introduces SACM, a CLIP-guided cross-modal contrastive framework that pairs SEEG signals with synchronized audio to decode Mandarin Chinese speech. Using the HUST-MIND dataset collected from eight patients, SACM demonstrates significant gains in both speech detection and word decoding accuracy, with sensorimotor cortex electrodes often achieving performance comparable to full electrode arrays. The study highlights the distributed nature of speech processing and the potential of multimodal BCIs, while noting limitations such as electrode coverage and limited sessions. Future directions include integrating additional modalities, targeted electrode placement for speech regions, and cross-subject pretraining to enhance generalization for online speech decoding BCIs.

Abstract

Speech disorders such as dysarthria and anarthria can severely impair the patient's ability to communicate verbally. Speech decoding brain-computer interfaces (BCIs) offer a potential alternative by directly translating speech intentions into spoken words, serving as speech neuroprostheses. This paper reports an experimental protocol for Mandarin Chinese speech decoding BCIs, along with the corresponding decoding algorithms. Stereo-electroencephalography (SEEG) and synchronized audio data were collected from eight drug-resistant epilepsy patients as they conducted a word-level reading task. The proposed SEEG and Audio Contrastive Matching (SACM), a contrastive learning-based framework, achieved decoding accuracies significantly exceeding chance levels in both speech detection and speech decoding tasks. Electrode-wise analysis revealed that a single sensorimotor cortex electrode achieved performance comparable to that of the full electrode array. These findings provide valuable insights for developing more accurate online speech decoding BCIs.

Paper Structure

This paper contains 20 sections, 4 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: A speech decoding BCI. For patients with neurological conditions or severe disabilities, conventional speech production is impaired. A speech decoding BCI decodes brain signals associated with neural activities underlying speech production to restore communication.
  • Figure 2: Electrode implantation locations for the eight subjects. Each red dot represents a recording contact, and lighter-colored dots indicate contacts located on the right side of the brain.
  • Figure 3: Data collection setup, where SEEG and audio signals were recorded simultaneously as the subject read prompted words displayed on the screen.
  • Figure 4: The SACM framework. During the training stage, SEEG and audio representations are extracted using separate neural networks. The features are optimized by bringing positive pairs together while pushing negative pairs apart. During test, the most relevant audio segment is identified based on the SEEG trial.
  • Figure 5: Speech and non-speech segments in audio and SEEG data in the speech detection task.
  • ...and 3 more figures