Table of Contents
Fetching ...

SENSE: Efficient EEG-to-Text via Privacy-Preserving Semantic Retrieval

Akshaj Murhekar, Christina Liu, Abhijit Mishra, Shounak Roychowdhury, Jacek Gwizdka

Abstract

Decoding brain activity into natural language is a major challenge in AI with important applications in assistive communication, neurotechnology, and human-computer interaction. Most existing Brain-Computer Interface (BCI) approaches rely on memory-intensive fine-tuning of Large Language Models (LLMs) or encoder-decoder models on raw EEG signals, resulting in expensive training pipelines, limited accessibility, and potential exposure of sensitive neural data. We introduce SENSE (SEmantic Neural Sparse Extraction), a lightweight and privacy-preserving framework that translates non-invasive electroencephalography (EEG) into text without LLM fine-tuning. SENSE decouples decoding into two stages: on-device semantic retrieval and prompt-based language generation. EEG signals are locally mapped to a discrete textual space to extract a non-sensitive Bag-of-Words (BoW), which conditions an off-the-shelf LLM to synthesize fluent text in a zero-shot manner. The EEG-to-keyword module contains only ~6M parameters and runs fully on-device, ensuring raw neural signals remain local while only abstract semantic cues interact with language models. Evaluated on a 128-channel EEG dataset across six subjects, SENSE matches or surpasses the generative quality of fully fine-tuned baselines such as Thought2Text while substantially reducing computational overhead. By localizing neural decoding and sharing only derived textual cues, SENSE provides a scalable and privacy-aware retrieval-augmented architecture for next-generation BCIs.

SENSE: Efficient EEG-to-Text via Privacy-Preserving Semantic Retrieval

Abstract

Decoding brain activity into natural language is a major challenge in AI with important applications in assistive communication, neurotechnology, and human-computer interaction. Most existing Brain-Computer Interface (BCI) approaches rely on memory-intensive fine-tuning of Large Language Models (LLMs) or encoder-decoder models on raw EEG signals, resulting in expensive training pipelines, limited accessibility, and potential exposure of sensitive neural data. We introduce SENSE (SEmantic Neural Sparse Extraction), a lightweight and privacy-preserving framework that translates non-invasive electroencephalography (EEG) into text without LLM fine-tuning. SENSE decouples decoding into two stages: on-device semantic retrieval and prompt-based language generation. EEG signals are locally mapped to a discrete textual space to extract a non-sensitive Bag-of-Words (BoW), which conditions an off-the-shelf LLM to synthesize fluent text in a zero-shot manner. The EEG-to-keyword module contains only ~6M parameters and runs fully on-device, ensuring raw neural signals remain local while only abstract semantic cues interact with language models. Evaluated on a 128-channel EEG dataset across six subjects, SENSE matches or surpasses the generative quality of fully fine-tuned baselines such as Thought2Text while substantially reducing computational overhead. By localizing neural decoding and sharing only derived textual cues, SENSE provides a scalable and privacy-aware retrieval-augmented architecture for next-generation BCIs.
Paper Structure (21 sections, 5 equations, 6 figures, 2 tables)

This paper contains 21 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: SENSE pipeline overview: ChannelNet EEG encoder mishra-etal-2025-thought2text extracts vector $\mathbf{x}$; a Similarity Refiner maps $\mathbf{x}\!\to\!\mathbf{z}$. $\mathbf{z}$ is matrix-multiplied with CLIP-space vocabulary embeddings $\mathbf{E}$ to produce logits. A top-$k$ selector ($k=15$) yields salient tokens forming a Bag-of-Words (BoW). The BoW and predicted object label $o_{pred}$ prompt an off-the-shelf LLM for caption reconstruction. The refiner is trained against ground-truth $N$-hot vectors with specialized losses addressing class imbalance.
  • Figure 2: Subject-wise BLEU-1 performance across optimization strategies and LLM decoders. The radar charts demonstrate that premier closed-source models (Gemini and ChatGPT) consistently form the outer performance boundaries across all six subjects, with ChatGPT exhibiting a slight advantage in the Naive Baseline task. Furthermore, the topological consistency of the polygons indicates that our framework generalizes effectively across diverse neural patterns without requiring per-subject fine-tuning.
  • Figure 3: Baseline
  • Figure 4: BCE Loss
  • Figure 5: Contrastive ML
  • ...and 1 more figures