Table of Contents
Fetching ...

Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

Yuchen Wang, Haonan Wang, Yu Guo, Honglong Yang, Xiaomeng Li

TL;DR

SemKey is proposed, a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal, and moves beyond standard translation metrics by adopting N-way Retrieval Accuracy and Fr\'echet Distance to rigorously assess diversity and alignment.

Abstract

Decoding natural language from non-invasive EEG signals is a promising yet challenging task. However, current state-of-the-art models remain constrained by three fundamental limitations: Semantic Bias (mode collapse into generic templates), Signal Neglect (hallucination based on linguistic priors rather than neural inputs), and the BLEU Trap, where evaluation metrics are artificially inflated by high-frequency stopwords, masking a lack of true semantic fidelity. To address these challenges, we propose SemKey, a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal. We redesign the interaction between the neural encoder and the Large Language Model (LLM) by injecting semantic prompts as Queries and EEG embeddings as Key-Value pairs, strictly forcing the model to attend to neural inputs. Furthermore, we move beyond standard translation metrics by adopting N-way Retrieval Accuracy and Fréchet Distance to rigorously assess diversity and alignment. Extensive experiments demonstrate that our approach effectively eliminates hallucinations on noise inputs and achieves SOTA performance on these robust protocols. Code will be released upon acceptance at https://github.com/xmed-lab/SemKey.

Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

TL;DR

SemKey is proposed, a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal, and moves beyond standard translation metrics by adopting N-way Retrieval Accuracy and Fr\'echet Distance to rigorously assess diversity and alignment.

Abstract

Decoding natural language from non-invasive EEG signals is a promising yet challenging task. However, current state-of-the-art models remain constrained by three fundamental limitations: Semantic Bias (mode collapse into generic templates), Signal Neglect (hallucination based on linguistic priors rather than neural inputs), and the BLEU Trap, where evaluation metrics are artificially inflated by high-frequency stopwords, masking a lack of true semantic fidelity. To address these challenges, we propose SemKey, a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal. We redesign the interaction between the neural encoder and the Large Language Model (LLM) by injecting semantic prompts as Queries and EEG embeddings as Key-Value pairs, strictly forcing the model to attend to neural inputs. Furthermore, we move beyond standard translation metrics by adopting N-way Retrieval Accuracy and Fréchet Distance to rigorously assess diversity and alignment. Extensive experiments demonstrate that our approach effectively eliminates hallucinations on noise inputs and achieves SOTA performance on these robust protocols. Code will be released upon acceptance at https://github.com/xmed-lab/SemKey.
Paper Structure (21 sections, 5 equations, 5 figures, 13 tables)

This paper contains 21 sections, 5 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Top: Previous models liu2025learning first exhibit semantic bias, overfitting to generic templates (e.g., "He was a..." ) to artificially inflate BLEU scores despite severe semantic misalignment (the "BLEU Trap"). Furthermore, the noise test (Row 3) exposes signal neglect: the model continues to hallucinate fluent text even from pure Gaussian noise, proving that generation is driven by linguistic priors rather than the input signal. Bottom: In contrast, SemKey(Our Method) demonstrates strict signal dependency. It generates diverse, semantically faithful text for valid inputs while correctly yielding disordered tokens for noise, verifying that decoding is genuinely grounded in neural signals.
  • Figure 2: The overall architecture of the SemKey framework following a "Guidance-Generation" paradigm. Stage 1 (Attribute Extraction): The EEG encoder is optimized via multi-task learning to explicitly decouple high-level semantic attributes (Sentiment, Topic, Length, Surprisal) alongside standard alignment. Stage 2 (Generative Decoding): These predicted attributes structure a semantic prompt to guide generation. Crucially, the Q-K-V Injection mechanism enforces signal dependency by using the text prompt as the Query and projected EEG embeddings as Keys and Values.
  • Figure 3: Holistic Performance Evaluation. The Radar Chart illustrates the relative performance of our model (Orange) versus GLIM (Blue) and EEG-to-Text (Green). SemKey demonstrates a balanced superiority across all alignment, diversity, and content quality metrics.
  • Figure 4: Showcase of Gaussian Noise input. We illustrate the text generated by GLIM versus our method when the input is pure Gaussian noise instead of EEG signals.
  • Figure 5: t-SNE Visualization of Semantic Distribution. We visualize the sentence embeddings of Ground Truth (Blue), SemKey (Orange), GLIM (Yellow), and EEG-to-Text(Purple). Our model shows better overlap with the ground truth distribution, while effectively avoiding semantic bias (highlighted by red circles).