Table of Contents
Fetching ...

LI-DSN: A Layer-wise Interactive Dual-Stream Network for EEG Decoding

Chenghao Yue, Zhiyuan Ma, Zhongye Xia, Xinche Zhang, Yisi Zhang, Xinke Shen, Sen Song

Abstract

Electroencephalography (EEG) provides a non-invasive window into brain activity, offering high temporal resolution crucial for understanding and interacting with neural processes through brain-computer interfaces (BCIs). Current dual-stream neural networks for EEG often process temporal and spatial features independently through parallel branches, delaying their integration until a final, late-stage fusion. This design inherently leads to an "information silo" problem, precluding intermediate cross-stream refinement and hindering spatial-temporal decompositions essential for full feature utilization. We propose LI-DSN, a layer-wise interactive dual-stream network that facilitates progressive, cross-stream communication at each layer, thereby overcoming the limitations of late-fusion paradigms. LI-DSN introduces a novel Temporal-Spatial Integration Attention (TSIA) mechanism, which constructs a Spatial Affinity Correlation Matrix (SACM) to capture inter-electrode spatial structural relationships and a Temporal Channel Aggregation Matrix (TCAM) to integrate cosine-gated temporal dynamics under spatial guidance. Furthermore, we employ an adaptive fusion strategy with learnable channel weights to optimize the integration of dual-stream features. Extensive experiments across eight diverse EEG datasets, encompassing motor imagery (MI) classification, emotion recognition, and steady-state visual evoked potentials (SSVEP), consistently demonstrate that LI-DSN significantly outperforms 13 state-of-the-art (SOTA) baseline models, showcasing its superior robustness and decoding performance. The code will be publicized after acceptance.

LI-DSN: A Layer-wise Interactive Dual-Stream Network for EEG Decoding

Abstract

Electroencephalography (EEG) provides a non-invasive window into brain activity, offering high temporal resolution crucial for understanding and interacting with neural processes through brain-computer interfaces (BCIs). Current dual-stream neural networks for EEG often process temporal and spatial features independently through parallel branches, delaying their integration until a final, late-stage fusion. This design inherently leads to an "information silo" problem, precluding intermediate cross-stream refinement and hindering spatial-temporal decompositions essential for full feature utilization. We propose LI-DSN, a layer-wise interactive dual-stream network that facilitates progressive, cross-stream communication at each layer, thereby overcoming the limitations of late-fusion paradigms. LI-DSN introduces a novel Temporal-Spatial Integration Attention (TSIA) mechanism, which constructs a Spatial Affinity Correlation Matrix (SACM) to capture inter-electrode spatial structural relationships and a Temporal Channel Aggregation Matrix (TCAM) to integrate cosine-gated temporal dynamics under spatial guidance. Furthermore, we employ an adaptive fusion strategy with learnable channel weights to optimize the integration of dual-stream features. Extensive experiments across eight diverse EEG datasets, encompassing motor imagery (MI) classification, emotion recognition, and steady-state visual evoked potentials (SSVEP), consistently demonstrate that LI-DSN significantly outperforms 13 state-of-the-art (SOTA) baseline models, showcasing its superior robustness and decoding performance. The code will be publicized after acceptance.

Paper Structure

This paper contains 44 sections, 17 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Exploration of dual-stream interaction strategies for EEG analysis, where $F_t$ and $F_s$ raw temporal and spatial features respectively, $\tilde{F'_t}$ and $\tilde{F'_s}$ denote enhanced features after interaction. (a) Dual-stream architecture with cross-flow interactions between temporal and spatial streams. (b) Baseline with no cross-stream interaction before fusion. (c) Interaction outputs integrated only into the spatial stream. (d) Interaction outputs integrated only into the temporal stream. (e) Bidirectional interaction feeding both streams.
  • Figure 2: The overall architecture of the proposed LI-DSN. The raw EEG signals are first processed by parallel Temporal and Spatial Tokenizers to extract initial embeddings. These embeddings are then fed into $N$ blocks, where each block consists of a Feed-Forward Network (FFN) followed by the Temporal-Spatial Integration Attention (TSIA) module to enable progressive cross-stream interaction. The refined features are subsequently aggregated by the Adaptive Fusion module and passed through an MLP classification head to generate outputs for various EEG tasks (e.g., MI classification, Emotion Recognition, SSVEP). The detailed architectures of the five key components, namely the Spatial Tokenizer, Temporal Tokenizer, FFN Block, TSIA and Adaptive Fusion, are illustrated in the sub-panels.
  • Figure 3: Comparison of integration method under the LOSO protocol across four MI classification datasets.
  • Figure 4: Impact of architectural hyperparameters on performance under the LOSO protocol. (a) and (c) illustrate the effect of block depth $(N_s, N_t)$ on the SEED and FACED datasets respectively. (b) and (d) display the effect of feature embedding width on the SEED and FACED datasets.
  • Figure 5: Comparison of model performance, parameters, and MFLOPS on BNCI2014001, with model size indicated by circle area.
  • ...and 2 more figures