Table of Contents
Fetching ...

S$^2$M-Former: Spiking Symmetric Mixing Branchformer for Brain Auditory Attention Detection

Jiaqi Wang, Zhengyu Ma, Xiongri Shen, Chenlin Zhou, Leilei Zhao, Han Zhang, Yi Zhong, Siqi Cai, Zhenxi Song, Zhiguo Zhang

TL;DR

S2M-Former tackles EEG-based auditory attention detection under strict energy constraints by introducing a spike-driven, symmetric two-branch architecture that processes spatial and frequency features via lightweight 1D tokens. The model integrates branch-specific spiking encoders with a stacked S$^2$M block comprising SCSA, SMSC, SGCM, and MPTM to enable complementary learning and robust fusion. It achieves substantial parameter and energy reductions (up to $14.7\times$ fewer parameters and $5.8\times$ less energy) while maintaining competitive SOTA accuracy across KUL, DTU, and AV-GC-AAD in within-trial, cross-trial, and cross-subject settings, demonstrating strong generalization and suitability for neuromorphic AAD. The work highlights the practical potential of energy-efficient, brain-inspired AAD for neuro-steered hearing devices and lays groundwork for hardware-near implementations on neuromorphic platforms.

Abstract

Auditory attention detection (AAD) aims to decode listeners' focus in complex auditory environments from electroencephalography (EEG) recordings, which is crucial for developing neuro-steered hearing devices. Despite recent advancements, EEG-based AAD remains hindered by the absence of synergistic frameworks that can fully leverage complementary EEG features under energy-efficiency constraints. We propose S$^2$M-Former, a novel spiking symmetric mixing framework to address this limitation through two key innovations: i) Presenting a spike-driven symmetric architecture composed of parallel spatial and frequency branches with mirrored modular design, leveraging biologically plausible token-channel mixers to enhance complementary learning across branches; ii) Introducing lightweight 1D token sequences to replace conventional 3D operations, reducing parameters by 14.7$\times$. The brain-inspired spiking architecture further reduces power consumption, achieving a 5.8$\times$ energy reduction compared to recent ANN methods, while also surpassing existing SNN baselines in terms of parameter efficiency and performance. Comprehensive experiments on three AAD benchmarks (KUL, DTU and AV-GC-AAD) across three settings (within-trial, cross-trial and cross-subject) demonstrate that S$^2$M-Former achieves comparable state-of-the-art (SOTA) decoding accuracy, making it a promising low-power, high-performance solution for AAD tasks. Code is available at https://github.com/JackieWang9811/S2M-Former.

S$^2$M-Former: Spiking Symmetric Mixing Branchformer for Brain Auditory Attention Detection

TL;DR

S2M-Former tackles EEG-based auditory attention detection under strict energy constraints by introducing a spike-driven, symmetric two-branch architecture that processes spatial and frequency features via lightweight 1D tokens. The model integrates branch-specific spiking encoders with a stacked SM block comprising SCSA, SMSC, SGCM, and MPTM to enable complementary learning and robust fusion. It achieves substantial parameter and energy reductions (up to fewer parameters and less energy) while maintaining competitive SOTA accuracy across KUL, DTU, and AV-GC-AAD in within-trial, cross-trial, and cross-subject settings, demonstrating strong generalization and suitability for neuromorphic AAD. The work highlights the practical potential of energy-efficient, brain-inspired AAD for neuro-steered hearing devices and lays groundwork for hardware-near implementations on neuromorphic platforms.

Abstract

Auditory attention detection (AAD) aims to decode listeners' focus in complex auditory environments from electroencephalography (EEG) recordings, which is crucial for developing neuro-steered hearing devices. Despite recent advancements, EEG-based AAD remains hindered by the absence of synergistic frameworks that can fully leverage complementary EEG features under energy-efficiency constraints. We propose SM-Former, a novel spiking symmetric mixing framework to address this limitation through two key innovations: i) Presenting a spike-driven symmetric architecture composed of parallel spatial and frequency branches with mirrored modular design, leveraging biologically plausible token-channel mixers to enhance complementary learning across branches; ii) Introducing lightweight 1D token sequences to replace conventional 3D operations, reducing parameters by 14.7. The brain-inspired spiking architecture further reduces power consumption, achieving a 5.8 energy reduction compared to recent ANN methods, while also surpassing existing SNN baselines in terms of parameter efficiency and performance. Comprehensive experiments on three AAD benchmarks (KUL, DTU and AV-GC-AAD) across three settings (within-trial, cross-trial and cross-subject) demonstrate that SM-Former achieves comparable state-of-the-art (SOTA) decoding accuracy, making it a promising low-power, high-performance solution for AAD tasks. Code is available at https://github.com/JackieWang9811/S2M-Former.

Paper Structure

This paper contains 22 sections, 26 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: EEG-based AAD Paradigm.
  • Figure 2: S$^2$M-Former: A dual-branch mirrored architecture comprising branch-specific spiking encoders for diverse domain features, leveraging spike-driven symmetric modules for contextual representation learning and enabling effective complementary interactions across parallel branches.
  • Figure 3: Comparison across all subjects on two datasets under cross-trial.
  • Figure 4: Visualization comparison across all subjects on three datasets under within-trial settings.
  • Figure 5: Performance statistics across all models under within- and cross-trial.
  • ...and 4 more figures