Table of Contents
Fetching ...

DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation

Yiqun Duan, Jinzhao Zhou, Zhen Wang, Yu-Kai Wang, Chin-Teng Lin

TL;DR

DeWave introduces a discrete codex encoding framework that maps EEG signals to open-vocabulary text by pairing a vector-quantized encoder with a pre-trained language model (BART). The approach supports both word-level EEG features and raw EEG waves without external markers, using self-supervised wave encoding and cross-modal contrastive alignment to bridge EEG dynamics and language semantics. It achieves state-of-the-art results on the ZuCo dataset for word-level translation (BLEU-1 ≈ 41.35; Rouge-F ≈ 30.69) and demonstrates raw-wave translation (BLEU-1 ≈ 20.51; Rouge-F ≈ 24.27), with robust cross-subject performance. Despite these advances, the work acknowledges limitations such as reliance on teacher forcing and dataset constraints, outlining future directions toward silent speech and neural-feedback-enabled decoding.

Abstract

The translation of brain dynamics into natural language is pivotal for brain-computer interfaces (BCIs). With the swift advancement of large language models, such as ChatGPT, the need to bridge the gap between the brain and languages becomes increasingly pressing. Current methods, however, require eye-tracking fixations or event markers to segment brain dynamics into word-level features, which can restrict the practical application of these systems. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks. DeWave uses a quantized variational encoder to derive discrete codex encoding and align it with pre-trained language models. This discrete codex representation brings forth two advantages: 1) it realizes translation on raw waves without marker by introducing text-EEG contrastive alignment training, and 2) it alleviates the interference caused by individual differences in EEG waves through an invariant discrete codex with or without markers. Our model surpasses the previous baseline (40.1 and 31.7) by 3.06% and 6.34%, respectively, achieving 41.35 BLEU-1 and 33.71 Rouge-F on the ZuCo Dataset. This work is the first to facilitate the translation of entire EEG signal periods without word-level order markers (e.g., eye fixations), scoring 20.5 BLEU-1 and 29.5 Rouge-1 on the ZuCo Dataset.

DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation

TL;DR

DeWave introduces a discrete codex encoding framework that maps EEG signals to open-vocabulary text by pairing a vector-quantized encoder with a pre-trained language model (BART). The approach supports both word-level EEG features and raw EEG waves without external markers, using self-supervised wave encoding and cross-modal contrastive alignment to bridge EEG dynamics and language semantics. It achieves state-of-the-art results on the ZuCo dataset for word-level translation (BLEU-1 ≈ 41.35; Rouge-F ≈ 30.69) and demonstrates raw-wave translation (BLEU-1 ≈ 20.51; Rouge-F ≈ 24.27), with robust cross-subject performance. Despite these advances, the work acknowledges limitations such as reliance on teacher forcing and dataset constraints, outlining future directions toward silent speech and neural-feedback-enabled decoding.

Abstract

The translation of brain dynamics into natural language is pivotal for brain-computer interfaces (BCIs). With the swift advancement of large language models, such as ChatGPT, the need to bridge the gap between the brain and languages becomes increasingly pressing. Current methods, however, require eye-tracking fixations or event markers to segment brain dynamics into word-level features, which can restrict the practical application of these systems. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks. DeWave uses a quantized variational encoder to derive discrete codex encoding and align it with pre-trained language models. This discrete codex representation brings forth two advantages: 1) it realizes translation on raw waves without marker by introducing text-EEG contrastive alignment training, and 2) it alleviates the interference caused by individual differences in EEG waves through an invariant discrete codex with or without markers. Our model surpasses the previous baseline (40.1 and 31.7) by 3.06% and 6.34%, respectively, achieving 41.35 BLEU-1 and 33.71 Rouge-F on the ZuCo Dataset. This work is the first to facilitate the translation of entire EEG signal periods without word-level order markers (e.g., eye fixations), scoring 20.5 BLEU-1 and 29.5 Rouge-1 on the ZuCo Dataset.
Paper Structure (25 sections, 4 equations, 7 figures, 13 tables, 1 algorithm)

This paper contains 25 sections, 4 equations, 7 figures, 13 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overall illustration of translating EEG waves into text through quantised encoding.
  • Figure 2: The DeWave model structure involves vectorizing either word-level EEG features or raw EEG waves into embeddings (Section \ref{['subsec:vectorization']}). The vectorized features are then encoded into a latent variable $\mathbf{z}_c(\mathcal{X})$, which is converted into a discrete latent $\mathbf{z}_q(\mathcal{X})$ through codex indexing. Finally, a pre-trained BART model translates this discrete codex representation into texts.
  • Figure 3: The image demonstrates the process of self-supervised pre-training for raw waves. The left subgraph details our strategy for directing the encoder, utilizing both self-reconstruction and text alignment through contrastive learning.
  • Figure 4: The cross-subjects performance on ZuCo dataset.
  • Figure 5: Ablation study on different codex sizes and perception fields (raw waves).
  • ...and 2 more figures