Table of Contents
Fetching ...

BrainLLM: Generative Language Decoding from Brain Recordings

Ziyi Ye, Qingyao Ai, Yiqun Liu, Maarten de Rijke, Min Zhang, Christina Lioma, Tuukka Ruotsalo

TL;DR

This study presents a method that decode representation from brain recordings as input to a large language model, enabling the generation of language that reflects humans’ perceived semantic content.

Abstract

Generating human language through non-invasive brain-computer interfaces (BCIs) has the potential to unlock many applications, such as serving disabled patients and improving communication. Currently, however, generating language via BCIs has been previously successful only within a classification setup for selecting pre-generated sentence continuation candidates with the most likely cortical semantic representation. Inspired by recent research that revealed associations between the brain and the large computational language models, we propose a generative language BCI that utilizes the capacity of a large language model (LLM) jointly with a semantic brain decoder to directly generate language from functional magnetic resonance imaging (fMRI) input. The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli perceived, without prior knowledge of any pre-generated candidates. We compare the language generated from the presented model with a random control, pre-generated language selection approach, and a standard LLM, which generates common coherent text solely based on the next word likelihood according to statistical language training data. The proposed model is found to generate language that is more aligned with semantic stimulus in response to which brain input is sampled. Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.

BrainLLM: Generative Language Decoding from Brain Recordings

TL;DR

This study presents a method that decode representation from brain recordings as input to a large language model, enabling the generation of language that reflects humans’ perceived semantic content.

Abstract

Generating human language through non-invasive brain-computer interfaces (BCIs) has the potential to unlock many applications, such as serving disabled patients and improving communication. Currently, however, generating language via BCIs has been previously successful only within a classification setup for selecting pre-generated sentence continuation candidates with the most likely cortical semantic representation. Inspired by recent research that revealed associations between the brain and the large computational language models, we propose a generative language BCI that utilizes the capacity of a large language model (LLM) jointly with a semantic brain decoder to directly generate language from functional magnetic resonance imaging (fMRI) input. The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli perceived, without prior knowledge of any pre-generated candidates. We compare the language generated from the presented model with a random control, pre-generated language selection approach, and a standard LLM, which generates common coherent text solely based on the next word likelihood according to statistical language training data. The proposed model is found to generate language that is more aligned with semantic stimulus in response to which brain input is sampled. Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.
Paper Structure (30 sections, 12 equations, 3 figures, 2 tables)

This paper contains 30 sections, 12 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Language generation with brain recordings (BrainLLM).(a) The generation process has four main stages. $S_1$: Brain recordings in response to the perceived continuation are collected. $S_2$: A brain adapter extracts features from brain recordings and transforms them into hidden vectors that match the shape of text embeddings in a standard LLM. $S_3$: Brain embeddings and text prompt embeddings are concatenated as a prompt input. $S_4$: The prompt input is fed into the LLM for language generation. BrainLLM generates content that is an exact match ("the cutting edge of") with, or semantically similar/gist match content ("not for everyone") to the perceived continuation. (b) Examples of language generation with BrainLLM and its controls (PerBrainLLM). Text in blue and bold indicates that the generated content and the ground truth (perceived continuation) are manually annotated as semantically similar and an exact match, respectively.
  • Figure 2: Win rates of BrainLLM vs. PerBrainLLM measured by comparing the generation likelihood of the participant's perceived continuation. Error bars denote mean +/- SEM. The center line, top, and bottom of the box plot represent the group median, 75th percentile, and 25th percentile, respectively. Whiskers are extended to the most extreme data point that is no more than 1.5 $\times$ interquartile range from the edge of the box. (a) The win rates were significantly higher than 0.5 with $\ac{FDR} < 0.05$ (one-sided non-parametric test) across all datasets and participants. Each dot represents the win rate of a single participant in Pereira's dataset (5 participants), Huth's dataset (8 participants), and the Narratives dataset (28 participants). (b) The win rate increases as the surprise levels increase. The surprise level quantifies the model's likelihood of generating the continuation stimuli, whereas a higher surprise indicates a greater difficulty in generating the perceived continuation for the LLM. $*$ indicates the win rate is significantly higher than 0.5 with $\ac{FDR} < 0.05$ (one-sided non-parametric test). (c) Scatter plot of win rate versus surprise scores for 200 randomly selected tokens. A positive correlation is observed between win rate and surprise, indicating that tokens with higher surprise scores tend to have higher win rates. (d) The win rate when using brain signals from different cortical regions in a single participant (participant 1 in Huth's dataset). Brain data (colored regions) used as input for BrainLLM were partitioned into the Broca’s area, the precuneus (PrCu), the prefrontal cortex (PFC), the auditory cortex (AC), and the angular gyrus (AG). (e) The parameter sizes of LLMs exhibit a strong positive correlation with win rates, yielding Pearson's $r$ of 0.886 for Pereira's dataset, 0.953 for Huth's dataset, and 0.923 for the Narratives dataset. (f): The win rate demonstrates a positive correlation with the size of training data. $*$ indicates that the win rate is significantly higher than that of the control. For Huth's dataset and the Narratives dataset, which both utilize auditory-based stimuli, the win rate is notably consistent when the datasets are of equivalent size. The total number of data samples within Pereira's dataset, Huth's dataset, and the Narratives dataset amount to 376, 1,039, and an average of 5,546 across participants, respectively.
  • Figure 3: Full-text reconstruction with BrainLLM.a, Illustration of the full-text reconstruction task accomplished with BrainLLM. Each generation step could autoregressively provide the text prompt for the next step. b, Examples of full-text reconstruction with BrainLLM and a pre-construction and post-hoc selection method proposed by tang2023semantic. Text in blue indicates content that is semantically related to the subject's movement behavior and intention to move. Text in brown indicates content that is semantically related to the interaction between the subject and other individuals or objects.