Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI
Xujin Li, Wei Wei, Shuang Qiu, Xinyi Zhang, Fu Li, Huiguang He
TL;DR
This work tackles the problem of cross-task zero-calibration RSVP-BCI decoding where models trained on one RSVP task struggle to generalize to unseen tasks. It introduces ELIPformer, a transformer-based architecture that fuses EEG with language-image priors using a CLIP-based prompt encoder and a cross bi-attention mechanism to align modalities. The authors design three RSVP tasks and provide an open dataset across 71 subjects, showing that ELIPformer achieves superior cross-task decoding performance over conventional, CNN-based, and Transformer baselines. The results demonstrate effective semantic alignment between EEG and language-image features and highlight the approach's potential for rapid, practical deployment of RSVP-BCI systems in diverse scenarios.
Abstract
Rapid Serial Visual Presentation (RSVP)-based Brain-Computer Interface (BCI) is an effective technology used for information detection by detecting Event-Related Potentials (ERPs). The current RSVP decoding methods can perform well in decoding EEG signals within a single RSVP task, but their decoding performance significantly decreases when directly applied to different RSVP tasks without calibration data from the new tasks. This limits the rapid and efficient deployment of RSVP-BCI systems for detecting different categories of targets in various scenarios. To overcome this limitation, this study aims to enhance the cross-task zero-calibration RSVP decoding performance. First, we design three distinct RSVP tasks for target image retrieval and build an open-source dataset containing EEG signals and corresponding stimulus images. Then we propose an EEG with Language-Image Prior fusion Transformer (ELIPformer) for cross-task zero-calibration RSVP decoding. Specifically, we propose a prompt encoder based on the language-image pre-trained model to extract language-image features from task-specific prompts and stimulus images as prior knowledge for enhancing EEG decoding. A cross bidirectional attention mechanism is also adopted to facilitate the effective feature fusion and alignment between the EEG and language-image features. Extensive experiments demonstrate that the proposed model achieves superior performance in cross-task zero-calibration RSVP decoding, which promotes the RSVP-BCI system from research to practical application.
