Table of Contents
Fetching ...

Towards Brain Passage Retrieval -- An Investigation of EEG Query Representations

Niall McGuire, Yashar Moshfeghi

TL;DR

The paper tackles the challenge of translating visceral information needs into explicit queries in information retrieval, proposing Brain Passage Retrieval (BPR) to map EEG signals directly to passage representations in a shared semantic space. It develops a dual-encoder architecture with a specialized EEG encoder and a frozen language-model-based passage encoder, trained with a contrastive objective augmented by a uniformity term and subject-aware negative sampling. On the ZuCo dataset, BPR achieves up to 8.81% improvements in Precision@5 over EEG-to-text baselines and demonstrates cross-subject robustness, while revealing a persistent gap to text-based retrieval that may improve with more data and larger models. The work highlights the viability of direct brain-to-passage retrieval, presents actionable ablations that illuminate training dynamics, and outlines clear avenues for expanding dataset size, incorporating additional cognitive signals, and moving toward more natural, brain-driven IR interfaces.

Abstract

Information Retrieval (IR) systems primarily rely on users' ability to translate their internal information needs into (text) queries. However, this translation process is often uncertain and cognitively demanding, leading to queries that incompletely or inaccurately represent users' true needs. This challenge is particularly acute for users with ill-defined information needs or physical impairments that limit traditional text input, where the gap between cognitive intent and query expression becomes even more pronounced. Recent neuroscientific studies have explored Brain-Machine Interfaces (BMIs) as a potential solution, aiming to bridge the gap between users' cognitive semantics and their search intentions. However, current approaches attempting to decode explicit text queries from brain signals have shown limited effectiveness in learning robust brain-to-text representations, often failing to capture the nuanced semantic information present in brain patterns. To address these limitations, we propose BPR (Brain Passage Retrieval), a novel framework that eliminates the need for intermediate query translation by enabling direct retrieval of relevant passages from users' brain signals. Our approach leverages dense retrieval architectures to map EEG signals and text passages into a shared semantic space. Through comprehensive experiments on the ZuCo dataset, we demonstrate that BPR achieves up to 8.81% improvement in precision@5 over existing EEG-to-text baselines, while maintaining effectiveness across 30 participants. Our ablation studies reveal the critical role of hard negative sampling and specialised brain encoders in achieving robust cross-modal alignment. These results establish the viability of direct brain-to-passage retrieval and provide a foundation for developing more natural interfaces between users' cognitive states and IR systems.

Towards Brain Passage Retrieval -- An Investigation of EEG Query Representations

TL;DR

The paper tackles the challenge of translating visceral information needs into explicit queries in information retrieval, proposing Brain Passage Retrieval (BPR) to map EEG signals directly to passage representations in a shared semantic space. It develops a dual-encoder architecture with a specialized EEG encoder and a frozen language-model-based passage encoder, trained with a contrastive objective augmented by a uniformity term and subject-aware negative sampling. On the ZuCo dataset, BPR achieves up to 8.81% improvements in Precision@5 over EEG-to-text baselines and demonstrates cross-subject robustness, while revealing a persistent gap to text-based retrieval that may improve with more data and larger models. The work highlights the viability of direct brain-to-passage retrieval, presents actionable ablations that illuminate training dynamics, and outlines clear avenues for expanding dataset size, incorporating additional cognitive signals, and moving toward more natural, brain-driven IR interfaces.

Abstract

Information Retrieval (IR) systems primarily rely on users' ability to translate their internal information needs into (text) queries. However, this translation process is often uncertain and cognitively demanding, leading to queries that incompletely or inaccurately represent users' true needs. This challenge is particularly acute for users with ill-defined information needs or physical impairments that limit traditional text input, where the gap between cognitive intent and query expression becomes even more pronounced. Recent neuroscientific studies have explored Brain-Machine Interfaces (BMIs) as a potential solution, aiming to bridge the gap between users' cognitive semantics and their search intentions. However, current approaches attempting to decode explicit text queries from brain signals have shown limited effectiveness in learning robust brain-to-text representations, often failing to capture the nuanced semantic information present in brain patterns. To address these limitations, we propose BPR (Brain Passage Retrieval), a novel framework that eliminates the need for intermediate query translation by enabling direct retrieval of relevant passages from users' brain signals. Our approach leverages dense retrieval architectures to map EEG signals and text passages into a shared semantic space. Through comprehensive experiments on the ZuCo dataset, we demonstrate that BPR achieves up to 8.81% improvement in precision@5 over existing EEG-to-text baselines, while maintaining effectiveness across 30 participants. Our ablation studies reveal the critical role of hard negative sampling and specialised brain encoders in achieving robust cross-modal alignment. These results establish the viability of direct brain-to-passage retrieval and provide a foundation for developing more natural interfaces between users' cognitive states and IR systems.

Paper Structure

This paper contains 17 sections, 13 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: A) Traditional EEG-to-text pipeline requiring intermediate query decoding before retrieval. The approach first translates EEG signals into text queries before applying traditional (lexical/neural) text retrieval methods. B) Our proposed direct EEG query retrieval framework that eliminates the translation step by learning a shared embedding space between EEG signals and text passages, enabling direct relevance scoring between brain activity and documents.
  • Figure 2: Overview of the BPR architecture. A) EEG encoder processes EEG signals recorded during naturalistic reading, using eye-tracking fixations to align EEG data with individual words. The architecture consists of initial projection layers, positional encoding, and transformer layers with self-attention mechanisms to generate EEG query representations. B) Passage encoder leverages a frozen pre-trained language model with an additional lightweight adaptation layer to generate passage representations. C) Shared semantic space visualisation demonstrating how EEG queries (green) and passages (red) are mapped into a common embedding space, where $p^+$ indicates positive passage matches and other red cubes represent negatives $p^-$. Flame icons indicate trainable parameters while snowflakes indicate frozen model components.
  • Figure 3: Impact of query span masking probabilities ($p_{mask}$) on BPR lexical mismatch retrieval performance.