Table of Contents
Fetching ...

From Word Embedding to Reading Embedding Using Large Language Model, EEG and Eye-tracking

Yuhong Zhang, Shilai Yang, Gert Cauwenberghs, Tzyy-Ping Jung

TL;DR

Addresses the challenge of predicting word-level reading relevance to inference questions. Introduces a Reading Embedding that fuses BERT-based word embeddings with EEG and eye-gaze biomarkers via an attention-based transformer, guided by LLM-derived labels. On ZuCo 1.0 Task 3 TSR data (nine subjects), word embeddings alone achieve 92.7% accuracy, while the multi-modal integration reaches 68.7% (71.2% for the best subject); prompts and cross-modal learning enable robust prediction despite bio-signal noise. This work demonstrates a feasible pathway toward LLM-guided, multi-modal reading assistive tools and highlights the complementarity of language models and physiological signals in reading.

Abstract

Reading comprehension, a fundamental cognitive ability essential for knowledge acquisition, is a complex skill, with a notable number of learners lacking proficiency in this domain. This study introduces innovative tasks for Brain-Computer Interface (BCI), predicting the relevance of words or tokens read by individuals to the target inference words. We use state-of-the-art Large Language Models (LLMs) to guide a new reading embedding representation in training. This representation, integrating EEG and eye-tracking biomarkers through an attention-based transformer encoder, achieved a mean 5-fold cross-validation accuracy of 68.7% across nine subjects using a balanced sample, with the highest single-subject accuracy reaching 71.2%. This study pioneers the integration of LLMs, EEG, and eye-tracking for predicting human reading comprehension at the word level. We fine-tune the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for word embedding, devoid of information about the reading tasks. Despite this absence of task-specific details, the model effortlessly attains an accuracy of 92.7%, thereby validating our findings from LLMs. This work represents a preliminary step toward developing tools to assist reading.

From Word Embedding to Reading Embedding Using Large Language Model, EEG and Eye-tracking

TL;DR

Addresses the challenge of predicting word-level reading relevance to inference questions. Introduces a Reading Embedding that fuses BERT-based word embeddings with EEG and eye-gaze biomarkers via an attention-based transformer, guided by LLM-derived labels. On ZuCo 1.0 Task 3 TSR data (nine subjects), word embeddings alone achieve 92.7% accuracy, while the multi-modal integration reaches 68.7% (71.2% for the best subject); prompts and cross-modal learning enable robust prediction despite bio-signal noise. This work demonstrates a feasible pathway toward LLM-guided, multi-modal reading assistive tools and highlights the complementarity of language models and physiological signals in reading.

Abstract

Reading comprehension, a fundamental cognitive ability essential for knowledge acquisition, is a complex skill, with a notable number of learners lacking proficiency in this domain. This study introduces innovative tasks for Brain-Computer Interface (BCI), predicting the relevance of words or tokens read by individuals to the target inference words. We use state-of-the-art Large Language Models (LLMs) to guide a new reading embedding representation in training. This representation, integrating EEG and eye-tracking biomarkers through an attention-based transformer encoder, achieved a mean 5-fold cross-validation accuracy of 68.7% across nine subjects using a balanced sample, with the highest single-subject accuracy reaching 71.2%. This study pioneers the integration of LLMs, EEG, and eye-tracking for predicting human reading comprehension at the word level. We fine-tune the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for word embedding, devoid of information about the reading tasks. Despite this absence of task-specific details, the model effortlessly attains an accuracy of 92.7%, thereby validating our findings from LLMs. This work represents a preliminary step toward developing tools to assist reading.
Paper Structure (9 sections, 2 equations, 3 figures)

This paper contains 9 sections, 2 equations, 3 figures.

Figures (3)

  • Figure 1: The overall workflow. The subjects read a sentence and answer some questions (a), then for each word or token, its word embedding, eye-gaze, and EEG embedding are processed and are put through in a reading-embedding model (b). The model is trained under the guidance of LLM, which produces the fuzzy ground truth labels (c).
  • Figure 2: Training data and the Reading-Embedding Model.
  • Figure 3: Classification results and t-SNE visualization.