DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation
Yiqun Duan, Jinzhao Zhou, Zhen Wang, Yu-Kai Wang, Chin-Teng Lin
TL;DR
DeWave introduces a discrete codex encoding framework that maps EEG signals to open-vocabulary text by pairing a vector-quantized encoder with a pre-trained language model (BART). The approach supports both word-level EEG features and raw EEG waves without external markers, using self-supervised wave encoding and cross-modal contrastive alignment to bridge EEG dynamics and language semantics. It achieves state-of-the-art results on the ZuCo dataset for word-level translation (BLEU-1 ≈ 41.35; Rouge-F ≈ 30.69) and demonstrates raw-wave translation (BLEU-1 ≈ 20.51; Rouge-F ≈ 24.27), with robust cross-subject performance. Despite these advances, the work acknowledges limitations such as reliance on teacher forcing and dataset constraints, outlining future directions toward silent speech and neural-feedback-enabled decoding.
Abstract
The translation of brain dynamics into natural language is pivotal for brain-computer interfaces (BCIs). With the swift advancement of large language models, such as ChatGPT, the need to bridge the gap between the brain and languages becomes increasingly pressing. Current methods, however, require eye-tracking fixations or event markers to segment brain dynamics into word-level features, which can restrict the practical application of these systems. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks. DeWave uses a quantized variational encoder to derive discrete codex encoding and align it with pre-trained language models. This discrete codex representation brings forth two advantages: 1) it realizes translation on raw waves without marker by introducing text-EEG contrastive alignment training, and 2) it alleviates the interference caused by individual differences in EEG waves through an invariant discrete codex with or without markers. Our model surpasses the previous baseline (40.1 and 31.7) by 3.06% and 6.34%, respectively, achieving 41.35 BLEU-1 and 33.71 Rouge-F on the ZuCo Dataset. This work is the first to facilitate the translation of entire EEG signal periods without word-level order markers (e.g., eye fixations), scoring 20.5 BLEU-1 and 29.5 Rouge-1 on the ZuCo Dataset.
