Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

Xiaoyu Chen; Changde Du; Che Liu; Yizhe Wang; Huiguang He

Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He

TL;DR

This work tackles open-vocabulary auditory text decoding from fMRI by introducing BP-GPT, which uses brain-derived prompts to steer GPT-2 and generate target text. It couples a text-to-text baseline to derive an optimal text prompt with a brain-to-text pathway and a contrastive alignment objective to bridge modality gaps. Empirical results show BP-GPT achieves meaningful gains in METEOR and BERTScore over prior methods, validating the viability of brain-prompts for LLM-driven neural decoding and highlighting the role of prompt design and alignment. The method offers a flexible, future-proof framework that can adapt to stronger LLMs and broader neural-decoding tasks as imaging modalities and models evolve.

Abstract

Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. However, many existing efforts concentrate on decoding small vocabulary sets, leaving space for the exploration of open vocabulary continuous text decoding. In this paper, we introduce a novel method, the \textbf{Brain Prompt GPT (BP-GPT)}. By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce a text-to-text baseline and align the fMRI prompt to the text prompt. By introducing the text-to-text baseline, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to $4.61\%$ on METEOR and $2.43\%$ on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective.

Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

TL;DR

Abstract

on METEOR and

on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective.

Paper Structure (23 sections, 8 equations, 5 figures, 4 tables)

This paper contains 23 sections, 8 equations, 5 figures, 4 tables.

Introduction
Related Work
Large language models
Decoding the Brain Signals into Text
Method
A Text to Text Baseline
fMRI to Text Decoding
fMRI-prompted text decoding.
Align with the Optimal Prompt
Training
Inference
Mapping Network and fMRI Encoder Model
Experiment
Dataset
Implementing Details
...and 8 more sections

Figures (5)

Figure 1: We focus on decoding semantic information from fMRI in the auditory neural decoding scenario and use fMRI signals as prompts to guide a pre-trained GPT-2 to achieve decoding.
Figure 2: The training stages of our method. The upper part: we use the BERT and GPT-2 for the encoder and decoder of our text-to-text baseline. In this baseline, the BERT representation will be mapped into a text prompt which is used for reconstructing the original text using GPT-2. The lower part: we use a transformer fMRI encoder to extract the fMRI prompt and add a contrastive loss to align the fMRI prompt to the text prompt. Then, the GPT-2 will decode the text according to the fMRI prompt.
Figure 3: An illustration of the inference stage is provided here. During this stage, the fMRI prompt is considered as the preceding text for the target text generation. Subsequently, GPT-2 generates the text in an autoregressive manner, relying on both the fMRI prompt and the generated text. For deciding the length of decoding text, we compared two strategies in this work. The first one is to use the word rate model to predict the length of text; The second one is to use special tokens and fine-tune the GPT-2, the decoding process will end when GPT-2 generates enough special tokens ($ in our implementation).
Figure 4: The cortical flatmaps for the auditory cortex (in red color) of the different subjects we used.
Figure 5: The performance under different prompt length.

Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

TL;DR

Abstract

Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

Authors

TL;DR

Abstract

Table of Contents

Figures (5)