Table of Contents
Fetching ...

Language Reconstruction with Brain Predictive Coding from fMRI Data

Congchi Yin, Ziyi Ye, Piji Li

TL;DR

The paper addresses fMRI-to-text decoding by integrating brain predictive coding into a Transformer-based generation framework. It introduces PredFT, comprising a main decoding network for language reconstruction and a side network that encodes brain predictive coding from six ROIs, fused via cross-attention and trained end-to-end with a joint objective. On the Narratives dataset, PredFT achieves state-of-the-art decoding performance, notably a BLEU-1 of $27.8\%$ for 40-frame sequences, and shows that ROI selection and prediction distance critically influence results. The work demonstrates that incorporating predictive-coding signals improves decoding quality and provides a principled way to align neural representations with language-model predictions, with implications for neuroscience-inspired language interfaces and brain-computer interfaces.

Abstract

Many recent studies have shown that the perception of speech can be decoded from brain signals and subsequently reconstructed as continuous language. However, there is a lack of neurological basis for how the semantic information embedded within brain signals can be used more effectively to guide language reconstruction. The theory of predictive coding suggests that human brain naturally engages in continuously predicting future word representations that span multiple timescales. This implies that the decoding of brain signals could potentially be associated with a predictable future. To explore the predictive coding theory within the context of language reconstruction, this paper proposes a novel model \textsc{PredFT} for jointly modeling neural decoding and brain prediction. It consists of a main decoding network for language reconstruction and a side network for predictive coding. The side network obtains brain predictive coding representation from related brain regions of interest with a multi-head self-attention module. This representation is fused into the main decoding network with cross-attention to facilitate the language models' generation process. Experiments are conducted on the largest naturalistic language comprehension fMRI dataset Narratives. \textsc{PredFT} achieves current state-of-the-art decoding performance with a maximum BLEU-1 score of $27.8\%$.

Language Reconstruction with Brain Predictive Coding from fMRI Data

TL;DR

The paper addresses fMRI-to-text decoding by integrating brain predictive coding into a Transformer-based generation framework. It introduces PredFT, comprising a main decoding network for language reconstruction and a side network that encodes brain predictive coding from six ROIs, fused via cross-attention and trained end-to-end with a joint objective. On the Narratives dataset, PredFT achieves state-of-the-art decoding performance, notably a BLEU-1 of for 40-frame sequences, and shows that ROI selection and prediction distance critically influence results. The work demonstrates that incorporating predictive-coding signals improves decoding quality and provides a principled way to align neural representations with language-model predictions, with implications for neuroscience-inspired language interfaces and brain-computer interfaces.

Abstract

Many recent studies have shown that the perception of speech can be decoded from brain signals and subsequently reconstructed as continuous language. However, there is a lack of neurological basis for how the semantic information embedded within brain signals can be used more effectively to guide language reconstruction. The theory of predictive coding suggests that human brain naturally engages in continuously predicting future word representations that span multiple timescales. This implies that the decoding of brain signals could potentially be associated with a predictable future. To explore the predictive coding theory within the context of language reconstruction, this paper proposes a novel model \textsc{PredFT} for jointly modeling neural decoding and brain prediction. It consists of a main decoding network for language reconstruction and a side network for predictive coding. The side network obtains brain predictive coding representation from related brain regions of interest with a multi-head self-attention module. This representation is fused into the main decoding network with cross-attention to facilitate the language models' generation process. Experiments are conducted on the largest naturalistic language comprehension fMRI dataset Narratives. \textsc{PredFT} achieves current state-of-the-art decoding performance with a maximum BLEU-1 score of .
Paper Structure (21 sections, 7 equations, 6 figures, 4 tables)

This paper contains 21 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Example and results of the predictive coding verification experiment.
  • Figure 2: The general framework of PredFT. The italic type words in the input word sequence stand for the first heard word of each fMRI image while the bold words stand for the prediction words.
  • Figure 3: Illustration of attention masks. Grey color indicates mask. The cross-attention mask is transposed for simplicity of expression.
  • Figure 4: The influence of prediction distance to decoding performance.
  • Figure 5: An example of the experiment for decoding error analysis. PosID and PosPCT stand for the word position index of truth and the percentage of index respectively.
  • ...and 1 more figures