Table of Contents
Fetching ...

DELTA: Language Diffusion-based EEG-to-Text Architecture

Mingyu Jeon, Hyobin Kim

TL;DR

DELTA tackles EEG-to-Text translation under severe noise and subject variability by replacing autoregressive decoding with a diffusion-based text generator and discretizing EEG with an RVQ tokenizer. The two-stage approach first converts EEG into discrete tokens, then leverages LLaDA to restore text via non-autoregressive denoising, enabling robust generation from limited data. On the ZuCo dataset, DELTA outperforms autoregressive baselines on word-level metrics, achieving BLEU-1 21.9 and ROUGE-1 F 17.2, demonstrating strong semantic restoration. This work suggests diffusion-based multimodal models can scale brain-language interfaces and invites future expansion to larger pre-training and additional signals like MEG.

Abstract

Electroencephalogram (EEG)-to-text remains challenging due to high-dimensional noise, subject variability, and error accumulation in autoregressive decoding. We introduce DELTA, which pairs a Residual Vector Quantization (RVQ) EEG tokenizer with a masked language diffusion model (LLaDA). RVQ discretizes continuous EEG into multi-layer tokens to reduce noise and individual differences, while LLaDA reconstructs sentences via non-sequential denoising. On ZuCo, DELTA improves semantic alignment by up to 5.37 points over autoregressive baselines, achieving BLEU-1 21.9 and ROUGE-1 F 17.2 under word-level conditions. These results enable reliable text generation from small EEG-text datasets and point toward scalable multimodal EEG-language models.

DELTA: Language Diffusion-based EEG-to-Text Architecture

TL;DR

DELTA tackles EEG-to-Text translation under severe noise and subject variability by replacing autoregressive decoding with a diffusion-based text generator and discretizing EEG with an RVQ tokenizer. The two-stage approach first converts EEG into discrete tokens, then leverages LLaDA to restore text via non-autoregressive denoising, enabling robust generation from limited data. On the ZuCo dataset, DELTA outperforms autoregressive baselines on word-level metrics, achieving BLEU-1 21.9 and ROUGE-1 F 17.2, demonstrating strong semantic restoration. This work suggests diffusion-based multimodal models can scale brain-language interfaces and invites future expansion to larger pre-training and additional signals like MEG.

Abstract

Electroencephalogram (EEG)-to-text remains challenging due to high-dimensional noise, subject variability, and error accumulation in autoregressive decoding. We introduce DELTA, which pairs a Residual Vector Quantization (RVQ) EEG tokenizer with a masked language diffusion model (LLaDA). RVQ discretizes continuous EEG into multi-layer tokens to reduce noise and individual differences, while LLaDA reconstructs sentences via non-sequential denoising. On ZuCo, DELTA improves semantic alignment by up to 5.37 points over autoregressive baselines, achieving BLEU-1 21.9 and ROUGE-1 F 17.2 under word-level conditions. These results enable reliable text generation from small EEG-text datasets and point toward scalable multimodal EEG-language models.

Paper Structure

This paper contains 15 sections, 4 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: The DELTA framework: (a) An RVQ-based tokenizer discretizes EEG signals. (b) A diffusion model is then pre-trained on EEG tokens, fine-tuned for EEG-to-Text generation, and used for inference.