Table of Contents
Fetching ...

CSSSTN: A Class-sensitive Subject-to-subject Semantic Style Transfer Network for EEG Classification in RSVP Tasks

Ziyue Yang, Chengrui Chen, Yong Peng, Qiong Chen, Wanzeng Kong

TL;DR

CSSSTN tackles cross-subject variability and BCI illiteracy in RSVP-EEG by employing a class-sensitive subject-to-subject semantic style transfer that aligns features per class between golden and target subjects. It extends SSSTN with a three-phase framework, subject-specific classifiers, a generator with self-attention, class templates, and a triad of losses (style, content, semantic) plus an ensemble predictor to fuse source and target decisions. Across the Tsinghua and HDU RSVP datasets, CSSSTN achieves mean balanced accuracy gains of 6.4% and 3.5% over state-of-the-art baselines, respectively, and can operate with as little as 25% of target data, significantly reducing calibration effort. Ablation studies and t-SNE visualizations validate the importance of class-sensitive transfer and lower-layer features for robust cross-subject transfer, supporting CSSSTN’s practical applicability in real-world RSVP-EEG BCI systems.

Abstract

The Rapid Serial Visual Presentation (RSVP) paradigm represents a promising application of electroencephalography (EEG) in Brain-Computer Interface (BCI) systems. However, cross-subject variability remains a critical challenge, particularly for BCI-illiterate users who struggle to effectively interact with these systems. To address this issue, we propose the Class-Sensitive Subject-to-Subject Semantic Style Transfer Network (CSSSTN), which incorporates a class-sensitive approach to align feature distributions between golden subjects (BCI experts) and target (BCI-illiterate) users on a class-by-class basis. Building on the SSSTN framework, CSSSTN incorporates three key components: (1) subject-specific classifier training, (2) a unique style loss to transfer class-discriminative features while preserving semantic information through a modified content loss, and (3) an ensemble approach to integrate predictions from both source and target domains. We evaluated CSSSTN using both a publicly available dataset and a self-collected dataset. Experimental results demonstrate that CSSSTN outperforms state-of-the-art methods, achieving mean balanced accuracy improvements of 6.4\% on the Tsinghua dataset and 3.5\% on the HDU dataset, with notable benefits for BCI-illiterate users. Ablation studies confirm the effectiveness of each component, particularly the class-sensitive transfer and the use of lower-layer features, which enhance transfer performance and mitigate negative transfer. Additionally, CSSSTN achieves competitive results with minimal target data, reducing calibration time and effort. These findings highlight the practical potential of CSSSTN for real-world BCI applications, offering a robust and scalable solution to improve the performance of BCI-illiterate users while minimizing reliance on extensive training data. Our code is available at https://github.com/ziyuey/CSSSTN.

CSSSTN: A Class-sensitive Subject-to-subject Semantic Style Transfer Network for EEG Classification in RSVP Tasks

TL;DR

CSSSTN tackles cross-subject variability and BCI illiteracy in RSVP-EEG by employing a class-sensitive subject-to-subject semantic style transfer that aligns features per class between golden and target subjects. It extends SSSTN with a three-phase framework, subject-specific classifiers, a generator with self-attention, class templates, and a triad of losses (style, content, semantic) plus an ensemble predictor to fuse source and target decisions. Across the Tsinghua and HDU RSVP datasets, CSSSTN achieves mean balanced accuracy gains of 6.4% and 3.5% over state-of-the-art baselines, respectively, and can operate with as little as 25% of target data, significantly reducing calibration effort. Ablation studies and t-SNE visualizations validate the importance of class-sensitive transfer and lower-layer features for robust cross-subject transfer, supporting CSSSTN’s practical applicability in real-world RSVP-EEG BCI systems.

Abstract

The Rapid Serial Visual Presentation (RSVP) paradigm represents a promising application of electroencephalography (EEG) in Brain-Computer Interface (BCI) systems. However, cross-subject variability remains a critical challenge, particularly for BCI-illiterate users who struggle to effectively interact with these systems. To address this issue, we propose the Class-Sensitive Subject-to-Subject Semantic Style Transfer Network (CSSSTN), which incorporates a class-sensitive approach to align feature distributions between golden subjects (BCI experts) and target (BCI-illiterate) users on a class-by-class basis. Building on the SSSTN framework, CSSSTN incorporates three key components: (1) subject-specific classifier training, (2) a unique style loss to transfer class-discriminative features while preserving semantic information through a modified content loss, and (3) an ensemble approach to integrate predictions from both source and target domains. We evaluated CSSSTN using both a publicly available dataset and a self-collected dataset. Experimental results demonstrate that CSSSTN outperforms state-of-the-art methods, achieving mean balanced accuracy improvements of 6.4\% on the Tsinghua dataset and 3.5\% on the HDU dataset, with notable benefits for BCI-illiterate users. Ablation studies confirm the effectiveness of each component, particularly the class-sensitive transfer and the use of lower-layer features, which enhance transfer performance and mitigate negative transfer. Additionally, CSSSTN achieves competitive results with minimal target data, reducing calibration time and effort. These findings highlight the practical potential of CSSSTN for real-world BCI applications, offering a robust and scalable solution to improve the performance of BCI-illiterate users while minimizing reliance on extensive training data. Our code is available at https://github.com/ziyuey/CSSSTN.

Paper Structure

This paper contains 19 sections, 8 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Experimental scheme in which subjects identify target images within a RSVP sequence, generating the corresponding EEG signals.
  • Figure 2: Overview of the proposed CSSSTN framework. The framework consists of three phases: (1) pretraining, (2) style transfer, and (3) prediction and ensemble. Input data $x_T$ (target subject) and $x_S$ (source subject) are used, while $x_S'$ is the source data transformed by generator $G$. In the pretraining phase, classifiers $C_T$ and $C_S$ are trained on $x_T$ and $x_S$, respectively. During the style transfer phase, content loss $L_{\text{cont}}$ is computed between first-layer features $h_{T}$ and $h_{S'}$, extracted from $x_T$ and $x_S'$ by $C_T$ and $C_S$. Style loss $L_{\text{style}}$ is computed to align the target-transformed features $h_{S'}$ with source class templates $\bar{h}_{S}^0$ and $\bar{h}_{S}^1$, which are the averaged features of $x_S$ for each class (non-target and target). Depending on $y_T$ (the target label), style loss is calculated as the KL divergence between $h_{S'}^0$ and $\bar{h}_{S}^0$, or between $h_{S'}^1$ and $\bar{h}_{S}^1$. Semantic loss $L_{\text{sem}}$ ensures that the predicted label $\hat{y}_{S'}$ (from $C_S$) matches the ground-truth $y_T$. The final prediction is obtained using a soft voting ensemble of $\hat{y}_{S'}$ and $\hat{y}_T$ (from $C_T$).
  • Figure 3: Architecture of the classifier $\mathbf{C}$ in the proposed framework. Each convolutional layer specifies the kernel size and the number of output channels. Variables $s$ and $p$ denote the stride and padding, respectively. The activation function used is ELU. The output features of the first convolutional blocks $h_{\text{layer1}}$ is utilized for loss computation. The linear layers (e.g., 64 and 2) indicate the output dimensions, with the final prediction size being 2.
  • Figure 4: Architecture of the generator $\mathbf{G}$ in the proposed framework, using an encoder-decoder structure with self-attention modules to enhance feature representation. The encoder compresses the input via convolutional layers, while the decoder reconstructs the output with upsampling and self-attention mechanisms, ensuring high-quality reconstruction.
  • Figure 5: t-SNE visualization of the change in the feature distribution of the target (S01, S05) before and after the style transfer. (A,D) Before style transfer. (B,E) After style tranfer in SSSTN. (C,F) After style tranfer in CSSSTN.
  • ...and 1 more figures