Language Reconstruction with Brain Predictive Coding from fMRI Data
Congchi Yin, Ziyi Ye, Piji Li
TL;DR
The paper addresses fMRI-to-text decoding by integrating brain predictive coding into a Transformer-based generation framework. It introduces PredFT, comprising a main decoding network for language reconstruction and a side network that encodes brain predictive coding from six ROIs, fused via cross-attention and trained end-to-end with a joint objective. On the Narratives dataset, PredFT achieves state-of-the-art decoding performance, notably a BLEU-1 of $27.8\%$ for 40-frame sequences, and shows that ROI selection and prediction distance critically influence results. The work demonstrates that incorporating predictive-coding signals improves decoding quality and provides a principled way to align neural representations with language-model predictions, with implications for neuroscience-inspired language interfaces and brain-computer interfaces.
Abstract
Many recent studies have shown that the perception of speech can be decoded from brain signals and subsequently reconstructed as continuous language. However, there is a lack of neurological basis for how the semantic information embedded within brain signals can be used more effectively to guide language reconstruction. The theory of predictive coding suggests that human brain naturally engages in continuously predicting future word representations that span multiple timescales. This implies that the decoding of brain signals could potentially be associated with a predictable future. To explore the predictive coding theory within the context of language reconstruction, this paper proposes a novel model \textsc{PredFT} for jointly modeling neural decoding and brain prediction. It consists of a main decoding network for language reconstruction and a side network for predictive coding. The side network obtains brain predictive coding representation from related brain regions of interest with a multi-head self-attention module. This representation is fused into the main decoding network with cross-attention to facilitate the language models' generation process. Experiments are conducted on the largest naturalistic language comprehension fMRI dataset Narratives. \textsc{PredFT} achieves current state-of-the-art decoding performance with a maximum BLEU-1 score of $27.8\%$.
