Table of Contents
Fetching ...

CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding

Yifan Zhuang, Calvin Huang, Zepeng Yu, Yongjie Zou, Jiawei Ju

TL;DR

CAT-Net tackles Mandarin tone decoding by fusing EEG and EMG signals with a cross-attention mechanism to capture neural-muscular coordination. It achieves high accuracy with a minimal-channel setup (20 EEG, 5 EMG) and employs domain-adversarial training to boost cross-subject generalization across audible and silent speech. Ablation studies show the cross-attention module and domain discriminator are crucial for performance and generalization, with silent-speech cross-subject accuracy reaching 85.10% and within-subject performance exceeding baselines. The work demonstrates practical potential for low-resource, multimodal BCI tone interfaces and provides a solid baseline for cross-subject, tone-level decoding in Mandarin.

Abstract

Brain-computer interface (BCI) speech decoding has emerged as a promising tool for assisting individuals with speech impairments. In this context, the integration of electroencephalography (EEG) and electromyography (EMG) signals offers strong potential for enhancing decoding performance. Mandarin tone classification presents particular challenges, as tonal variations convey distinct meanings even when phonemes remain identical. In this study, we propose a novel cross-subject multimodal BCI decoding framework that fuses EEG and EMG signals to classify four Mandarin tones under both audible and silent speech conditions. Inspired by the cooperative mechanisms of neural and muscular systems in speech production, our neural decoding architecture combines spatial-temporal feature extraction branches with a cross-attention fusion mechanism, enabling informative interaction between modalities. We further incorporate domain-adversarial training to improve cross-subject generalization. We collected 4,800 EEG trials and 4,800 EMG trials from 10 participants using only twenty EEG and five EMG channels, demonstrating the feasibility of minimal-channel decoding. Despite employing lightweight modules, our model outperforms state-of-the-art baselines across all conditions, achieving average classification accuracies of 87.83% for audible speech and 88.08% for silent speech. In cross-subject evaluations, it still maintains strong performance with accuracies of 83.27% and 85.10% for audible and silent speech, respectively. We further conduct ablation studies to validate the effectiveness of each component. Our findings suggest that tone-level decoding with minimal EEG-EMG channels is feasible and potentially generalizable across subjects, contributing to the development of practical BCI applications.

CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding

TL;DR

CAT-Net tackles Mandarin tone decoding by fusing EEG and EMG signals with a cross-attention mechanism to capture neural-muscular coordination. It achieves high accuracy with a minimal-channel setup (20 EEG, 5 EMG) and employs domain-adversarial training to boost cross-subject generalization across audible and silent speech. Ablation studies show the cross-attention module and domain discriminator are crucial for performance and generalization, with silent-speech cross-subject accuracy reaching 85.10% and within-subject performance exceeding baselines. The work demonstrates practical potential for low-resource, multimodal BCI tone interfaces and provides a solid baseline for cross-subject, tone-level decoding in Mandarin.

Abstract

Brain-computer interface (BCI) speech decoding has emerged as a promising tool for assisting individuals with speech impairments. In this context, the integration of electroencephalography (EEG) and electromyography (EMG) signals offers strong potential for enhancing decoding performance. Mandarin tone classification presents particular challenges, as tonal variations convey distinct meanings even when phonemes remain identical. In this study, we propose a novel cross-subject multimodal BCI decoding framework that fuses EEG and EMG signals to classify four Mandarin tones under both audible and silent speech conditions. Inspired by the cooperative mechanisms of neural and muscular systems in speech production, our neural decoding architecture combines spatial-temporal feature extraction branches with a cross-attention fusion mechanism, enabling informative interaction between modalities. We further incorporate domain-adversarial training to improve cross-subject generalization. We collected 4,800 EEG trials and 4,800 EMG trials from 10 participants using only twenty EEG and five EMG channels, demonstrating the feasibility of minimal-channel decoding. Despite employing lightweight modules, our model outperforms state-of-the-art baselines across all conditions, achieving average classification accuracies of 87.83% for audible speech and 88.08% for silent speech. In cross-subject evaluations, it still maintains strong performance with accuracies of 83.27% and 85.10% for audible and silent speech, respectively. We further conduct ablation studies to validate the effectiveness of each component. Our findings suggest that tone-level decoding with minimal EEG-EMG channels is feasible and potentially generalizable across subjects, contributing to the development of practical BCI applications.

Paper Structure

This paper contains 35 sections, 4 equations, 10 figures, 11 tables, 1 algorithm.

Figures (10)

  • Figure 1: The architecture of our proposed CATNet.
  • Figure 3: EEG channel weight calculated by Channel-Attention block.
  • Figure : (a)
  • Figure B1: Spatial distribution of EEG electrode placement across the scalp.
  • Figure B2: Locations of EMG sensor placement for muscle activity recording. (1. Right Buccinator. 2. Right Cervical Trapezius. 3. Left Buccinator. 4. Left Cervical Trapezius. 5. Mentalis.)
  • ...and 5 more figures