Table of Contents
Fetching ...

IncepFormerNet: A multi-scale multi-head attention network for SSVEP classification

Yan Huang, Yongru Chen, Lei Cao, Yongnian Cao, Xuechun Yang, Yilin Dong, Tianyu Liu

TL;DR

This work tackles fast and reliable SSVEP decoding for BCI by introducing IncepFormerNet, a hybrid architecture that marries Inception-style multi-scale temporal convolutions with Transformer-based attention, augmented by filter-bank spectral features. The model processes time-domain EEG from occipital channels through four modules—Channel Fusion, Time Feature Extraction, Former, and Classifier—achieving strong within-subject accuracy on Benchmark and BETA datasets, especially at short time windows. Across extensive ablations and comparisons with FBCCA, TRCA, EEGNet, and Transformer-based baselines, IncepFormerNet consistently outperforms rivals in accuracy and ITR, while maintaining computational efficiency suitable for real-time use. The results underscore the value of combining multi-scale temporal feature extraction with global temporal modeling for robust SSVEP-BCI performance and offer a practical path toward real-time, user-friendly BCI systems.

Abstract

In recent years, deep learning (DL) models have shown outstanding performance in EEG classification tasks, particularly in Steady-State Visually Evoked Potential(SSVEP)-based Brain-Computer-Interfaces(BCI)systems. DL methods have been successfully applied to SSVEP-BCI. This study proposes a new model called IncepFormerNet, which is a hybrid of the Inception and Transformer architectures. IncepFormerNet adeptly extracts multi-scale temporal information from time series data using parallel convolution kernels of varying sizes, accurately capturing the subtle variations and critical features within SSVEP signals.Furthermore, the model integrates the multi-head attention mechanism from the Transformer architecture, which not only provides insights into global dependencies but also significantly enhances the understanding and representation of complex patterns.Additionally, it takes advantage of filter bank techniques to extract features based on the spectral characteristics of SSVEP data. To validate the effectiveness of the proposed model, we conducted experiments on two public datasets, . The experimental results show that IncepFormerNet achieves an accuracy of 87.41 on Dataset 1 and 71.97 on Dataset 2 using a 1.0-second time window. To further verify the superiority of the proposed model, we compared it with other deep learning models, and the results indicate that our method achieves significantly higher accuracy than the others.The source codes in this work are available at: https://github.com/CECNL/SSVEP-DAN.

IncepFormerNet: A multi-scale multi-head attention network for SSVEP classification

TL;DR

This work tackles fast and reliable SSVEP decoding for BCI by introducing IncepFormerNet, a hybrid architecture that marries Inception-style multi-scale temporal convolutions with Transformer-based attention, augmented by filter-bank spectral features. The model processes time-domain EEG from occipital channels through four modules—Channel Fusion, Time Feature Extraction, Former, and Classifier—achieving strong within-subject accuracy on Benchmark and BETA datasets, especially at short time windows. Across extensive ablations and comparisons with FBCCA, TRCA, EEGNet, and Transformer-based baselines, IncepFormerNet consistently outperforms rivals in accuracy and ITR, while maintaining computational efficiency suitable for real-time use. The results underscore the value of combining multi-scale temporal feature extraction with global temporal modeling for robust SSVEP-BCI performance and offer a practical path toward real-time, user-friendly BCI systems.

Abstract

In recent years, deep learning (DL) models have shown outstanding performance in EEG classification tasks, particularly in Steady-State Visually Evoked Potential(SSVEP)-based Brain-Computer-Interfaces(BCI)systems. DL methods have been successfully applied to SSVEP-BCI. This study proposes a new model called IncepFormerNet, which is a hybrid of the Inception and Transformer architectures. IncepFormerNet adeptly extracts multi-scale temporal information from time series data using parallel convolution kernels of varying sizes, accurately capturing the subtle variations and critical features within SSVEP signals.Furthermore, the model integrates the multi-head attention mechanism from the Transformer architecture, which not only provides insights into global dependencies but also significantly enhances the understanding and representation of complex patterns.Additionally, it takes advantage of filter bank techniques to extract features based on the spectral characteristics of SSVEP data. To validate the effectiveness of the proposed model, we conducted experiments on two public datasets, . The experimental results show that IncepFormerNet achieves an accuracy of 87.41 on Dataset 1 and 71.97 on Dataset 2 using a 1.0-second time window. To further verify the superiority of the proposed model, we compared it with other deep learning models, and the results indicate that our method achieves significantly higher accuracy than the others.The source codes in this work are available at: https://github.com/CECNL/SSVEP-DAN.

Paper Structure

This paper contains 28 sections, 3 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The diagram of IncepFormerNet model.(a) Channel Fusion Module.(b)Time Feature Extraction Module.(c)Former Module.(d)Classifier Module.
  • Figure 2: Apply convolution operations to the filtered data from three sub-bands to achieve a weighted combination across multiple channels, thereby generating a fused spatial feature.
  • Figure 3: A detailed diagram of the temporal feature extraction module, which includes four convolutional kernels of different scales, with arrows indicating the respective fusion pathways.
  • Figure 4: A detailed diagram of the Former Module, which employs a two-layer encoder. (a) Prefix encoding (b) Multi-head attention mechanism.
  • Figure 5: (a)Comparison of average ITR across different methods and time windows on Dataset 1. (b)Comparison of average ITR across different methods and time windows on Dataset 2.
  • ...and 3 more figures