Table of Contents
Fetching ...

SpellerSSL: Self-Supervised Learning with P300 Aggregation for Speller BCIs

Jiazhen Hong, Geoff Mackellar, Soheila Ghane

TL;DR

SpellerSSL addresses the core bottlenecks of EEG-based P300 spellers—low SNR, poor generalization, and lengthy calibration—by integrating self-supervised pretraining on a customized 1D U-Net with a lightweight ERP-Head for P300 detection, plus a P300 aggregation scheme that denoises training signals. Pretraining uses a reconstruction objective with time masking and a frequency-domain consistency term across cross-domain and in-domain EEG data, followed by downstream fine-tuning on subject data. Results show that in-domain SSL with moderate aggregation (G=2) delivers state-of-the-art CRR (94% at 7 repetitions) and high ITR (up to 21.86 bits/min), while substantially reducing calibration needs (up to 60%). Cross-domain SSL also demonstrates strong transferability, highlighting the potential for EEG foundation models in P300 speller BCIs and practical improvements in efficiency and generalization.

Abstract

Electroencephalogram (EEG)-based P300 speller brain-computer interfaces (BCIs) face three main challenges: low signal-to-noise ratio (SNR), poor generalization, and time-consuming calibration. We propose SpellerSSL, a framework that combines self-supervised learning (SSL) with P300 aggregation to address these issues. First, we introduce an aggregation strategy to enhance SNR. Second, to achieve generalization in training, we employ a customized 1D U-Net backbone and pretrain the model on both cross-domain and in-domain EEG data. The pretrained model is subsequently fine-tuned with a lightweight ERP-Head classifier for P300 detection, which adapts the learned representations to subject-specific data. Our evaluations on calibration time demonstrate that combining the aggregation strategy with SSL significantly reduces the calibration burden per subject and improves robustness across subjects. Experimental results show that SSL learns effective EEG representations in both in-domain and cross-domain, with in-domain achieving a state-of-the-art character recognition rate of 94% with only 7 repetitions and the highest information transfer rate (ITR) of 21.86 bits/min on the public II-B dataset. Moreover, in-domain SSL with P300 aggregation reduces the required calibration size by 60% while maintaining a comparable character recognition rate. To the best of our knowledge, this is the first study to apply SSL to P300 spellers, highlighting its potential to improve both efficiency and generalization in speller BCIs and paving the way toward an EEG foundation model for P300 speller BCIs.

SpellerSSL: Self-Supervised Learning with P300 Aggregation for Speller BCIs

TL;DR

SpellerSSL addresses the core bottlenecks of EEG-based P300 spellers—low SNR, poor generalization, and lengthy calibration—by integrating self-supervised pretraining on a customized 1D U-Net with a lightweight ERP-Head for P300 detection, plus a P300 aggregation scheme that denoises training signals. Pretraining uses a reconstruction objective with time masking and a frequency-domain consistency term across cross-domain and in-domain EEG data, followed by downstream fine-tuning on subject data. Results show that in-domain SSL with moderate aggregation (G=2) delivers state-of-the-art CRR (94% at 7 repetitions) and high ITR (up to 21.86 bits/min), while substantially reducing calibration needs (up to 60%). Cross-domain SSL also demonstrates strong transferability, highlighting the potential for EEG foundation models in P300 speller BCIs and practical improvements in efficiency and generalization.

Abstract

Electroencephalogram (EEG)-based P300 speller brain-computer interfaces (BCIs) face three main challenges: low signal-to-noise ratio (SNR), poor generalization, and time-consuming calibration. We propose SpellerSSL, a framework that combines self-supervised learning (SSL) with P300 aggregation to address these issues. First, we introduce an aggregation strategy to enhance SNR. Second, to achieve generalization in training, we employ a customized 1D U-Net backbone and pretrain the model on both cross-domain and in-domain EEG data. The pretrained model is subsequently fine-tuned with a lightweight ERP-Head classifier for P300 detection, which adapts the learned representations to subject-specific data. Our evaluations on calibration time demonstrate that combining the aggregation strategy with SSL significantly reduces the calibration burden per subject and improves robustness across subjects. Experimental results show that SSL learns effective EEG representations in both in-domain and cross-domain, with in-domain achieving a state-of-the-art character recognition rate of 94% with only 7 repetitions and the highest information transfer rate (ITR) of 21.86 bits/min on the public II-B dataset. Moreover, in-domain SSL with P300 aggregation reduces the required calibration size by 60% while maintaining a comparable character recognition rate. To the best of our knowledge, this is the first study to apply SSL to P300 spellers, highlighting its potential to improve both efficiency and generalization in speller BCIs and paving the way toward an EEG foundation model for P300 speller BCIs.

Paper Structure

This paper contains 27 sections, 17 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Overview of the proposed SpellerSSL.
  • Figure 2: P300 aggregation for typing the character "M".
  • Figure 3: Reconstruction performance across 64 EEG channels. The montage layout is shown in the center, with representative channels displayed around it. Each plot compares the original P300 response (black) with reconstructions from models trained from scratch (gray), with in-domain pretraining (red), and with cross-domain pretraining (blue). More visualizations in Appendix \ref{['app:recon']}.
  • Figure 4: Distributions of decision scores (log-odds) under three pretraining conditions: (a–c) training from scratch, (d–f) cross-domain pretraining, and (g–i) in-domain pretraining. For each condition, we evaluate three aggregation levels $G\!\in\!\{1,2,3\}$. Green curves and bars show P300 probability density function (PDF) and histogram (Hist), while black dashed curves and gray bars show Non-P300. The Fisher’s Discriminant Ratio (FDR; higher is better) is reported.
  • Figure 5: Calibration reduction analysis using in-domain checkpoints. Baseline results from scratch ($G=1$) are shown in gray, while gains from in-domain pretraining with $G=1$ and $G=2$ are shown in blue and red, respectively.
  • ...and 3 more figures