Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation
Xiaozhao Liu, Dinggang Shen, Xihui Liu
TL;DR
This work tackles the reliability of EEG-to-text decoding by addressing posterior collapse, reframing decoding as semantic summarization, and introducing the Generative Language Inspection Model (GLIM). GLIM combines a domain-adaptive EEG encoder, a frozen encoder-decoder language model, and a cross-modal querying aligner to align EEG representations with the LM latent space, use multiple paraphrased targets, and regularize with a cross-modal contrastive objective. The approach yields fluent, EEG-grounded sentences without teacher forcing and enables robust semantic evaluation via EEG-text retrieval and zero-shot classification across sentiment, relation types, and topics, demonstrating strong generalization across heterogeneous domains on the ZuCo dataset. This work lays groundwork for reliable, scalable benchmarking in generative brain decoding and points toward practical non-invasive language BCI systems through improved semantic grounding. The key contribution is a modular, interpretable framework that mitigates posterior collapse and emphasizes high-level semantic alignment over surface-level lexical reconstruction, supported by comprehensive semantic evaluations beyond traditional text similarity metrics.
Abstract
Pretrained generative models have opened new frontiers in brain decoding by enabling the synthesis of realistic texts and images from non-invasive brain recordings. However, the reliability of such outputs remains questionable--whether they truly reflect semantic activation in the brain, or are merely hallucinated by the powerful generative models. In this paper, we focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse. Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings rather than previously verbatim reconstruction of stimulus texts. To this end, we propose the Generative Language Inspection Model (GLIM), which emphasizes learning informative and interpretable EEG representations to improve semantic grounding under heterogeneous and small-scale data conditions. Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences without teacher forcing. Moreover, it supports more robust evaluation beyond text similarity, through EEG-text retrieval and zero-shot semantic classification across sentiment categories, relation types, and corpus topics. Together, our architecture and evaluation protocols lay the foundation for reliable and scalable benchmarking in generative brain decoding.
