DELTA: Language Diffusion-based EEG-to-Text Architecture
Mingyu Jeon, Hyobin Kim
TL;DR
DELTA tackles EEG-to-Text translation under severe noise and subject variability by replacing autoregressive decoding with a diffusion-based text generator and discretizing EEG with an RVQ tokenizer. The two-stage approach first converts EEG into discrete tokens, then leverages LLaDA to restore text via non-autoregressive denoising, enabling robust generation from limited data. On the ZuCo dataset, DELTA outperforms autoregressive baselines on word-level metrics, achieving BLEU-1 21.9 and ROUGE-1 F 17.2, demonstrating strong semantic restoration. This work suggests diffusion-based multimodal models can scale brain-language interfaces and invites future expansion to larger pre-training and additional signals like MEG.
Abstract
Electroencephalogram (EEG)-to-text remains challenging due to high-dimensional noise, subject variability, and error accumulation in autoregressive decoding. We introduce DELTA, which pairs a Residual Vector Quantization (RVQ) EEG tokenizer with a masked language diffusion model (LLaDA). RVQ discretizes continuous EEG into multi-layer tokens to reduce noise and individual differences, while LLaDA reconstructs sentences via non-sequential denoising. On ZuCo, DELTA improves semantic alignment by up to 5.37 points over autoregressive baselines, achieving BLEU-1 21.9 and ROUGE-1 F 17.2 under word-level conditions. These results enable reliable text generation from small EEG-text datasets and point toward scalable multimodal EEG-language models.
