Table of Contents
Fetching ...

Subject Disentanglement Neural Network for Speech Envelope Reconstruction from EEG

Li Zhang, Jiyao Liu

TL;DR

The paper tackles cross-subject variability in EEG-based speech envelope reconstruction. It introduces SDN-Net, a three-module architecture comprising MLA-Codec, CTA-MTDNN, and MPN-MI to achieve subject-independent envelope decoding. MLA-Codec decodes envelopes from EEG; CTA-MTDNN extracts subject identity with multi-scale time-delay and attention; MPN-MI enforces independence via variational mutual information estimation and a joint objective. On the Auditory EEG Decoding Dataset, SDN-Net outperforms state-of-the-art methods for both inner- and cross-subject reconstruction, advancing cross-subject generalization in neural speech decoding with potential for robust brain-computer interfaces.

Abstract

Reconstructing speech envelopes from EEG signals is essential for exploring neural mechanisms underlying speech perception. Yet, EEG variability across subjects and physiological artifacts complicate accurate reconstruction. To address this problem, we introduce Subject Disentangling Neural Network (SDN-Net), which disentangles subject identity information from reconstructed speech envelopes to enhance cross-subject reconstruction accuracy. SDN-Net integrates three key components: MLA-Codec, MPN-MI, and CTA-MTDNN. The MLA-Codec, a fully convolutional neural network, decodes EEG signals into speech envelopes. The CTA-MTDNN module, a multi-scale time-delay neural network with channel and temporal attention, extracts subject identity features from EEG signals. Lastly, the MPN-MI module, a mutual information estimator with a multi-layer perceptron, supervises the removal of subject identity information from the reconstructed speech envelope. Experiments on the Auditory EEG Decoding Dataset demonstrate that SDN-Net achieves superior performance in inner- and cross-subject speech envelope reconstruction compared to recent state-of-the-art methods.

Subject Disentanglement Neural Network for Speech Envelope Reconstruction from EEG

TL;DR

The paper tackles cross-subject variability in EEG-based speech envelope reconstruction. It introduces SDN-Net, a three-module architecture comprising MLA-Codec, CTA-MTDNN, and MPN-MI to achieve subject-independent envelope decoding. MLA-Codec decodes envelopes from EEG; CTA-MTDNN extracts subject identity with multi-scale time-delay and attention; MPN-MI enforces independence via variational mutual information estimation and a joint objective. On the Auditory EEG Decoding Dataset, SDN-Net outperforms state-of-the-art methods for both inner- and cross-subject reconstruction, advancing cross-subject generalization in neural speech decoding with potential for robust brain-computer interfaces.

Abstract

Reconstructing speech envelopes from EEG signals is essential for exploring neural mechanisms underlying speech perception. Yet, EEG variability across subjects and physiological artifacts complicate accurate reconstruction. To address this problem, we introduce Subject Disentangling Neural Network (SDN-Net), which disentangles subject identity information from reconstructed speech envelopes to enhance cross-subject reconstruction accuracy. SDN-Net integrates three key components: MLA-Codec, MPN-MI, and CTA-MTDNN. The MLA-Codec, a fully convolutional neural network, decodes EEG signals into speech envelopes. The CTA-MTDNN module, a multi-scale time-delay neural network with channel and temporal attention, extracts subject identity features from EEG signals. Lastly, the MPN-MI module, a mutual information estimator with a multi-layer perceptron, supervises the removal of subject identity information from the reconstructed speech envelope. Experiments on the Auditory EEG Decoding Dataset demonstrate that SDN-Net achieves superior performance in inner- and cross-subject speech envelope reconstruction compared to recent state-of-the-art methods.
Paper Structure (10 sections, 13 equations, 4 figures, 1 table)

This paper contains 10 sections, 13 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The overview of SDN-Net. (a) is a multi-level aggregation EEG codec (MLA- Codec) for decoding speech envelopes. (b) is a mutual information (MI) estimator with a multi-layer perceptron network (MPN-MI). (c) is a multi-scale time-delay neural network with channel and temporal attention (CTA-MTDNN) for subject classification. Note: The signals of four channels (Actually, there are 64-channel EEG signals) as the input is simply drawn to save space.
  • Figure 2: Mean of Pearson Correlation Coefficient of Different Models on Different Subjects.
  • Figure 3: Mean of Pearson Correlation Coefficient of Cross-subject and inner-subject. (I) means inner-subject test results. and (C) means cross-subject test results.
  • Figure 4: Mean of Pearson Correlation of Different Generalization Model for Cross-subject Challenge.