Table of Contents
Fetching ...

Decoding Continuous Character-based Language from Non-invasive Brain Recordings

Cenyuan Zhang, Xiaoqing Zheng, Ruicheng Yin, Shujie Geng, Jianhan Xu, Xuan Gao, Changze Lv, Zixuan Ling, Xuanjing Huang, Miao Cao, Jianfeng Feng

TL;DR

A novel approach to decoding continuous language from single-trial non-invasive fMRI recordings is proposed, in which a three-dimensional convolutional network augmented with information bottleneck is developed to automatically identify responsive voxels to stimuli, and a character-based decoder is designed for the semantic reconstruction of continuous language characterized by inherent character structures.

Abstract

Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to decoding continuous language from single-trial non-invasive fMRI recordings, in which a three-dimensional convolutional network augmented with information bottleneck is developed to automatically identify responsive voxels to stimuli, and a character-based decoder is designed for the semantic reconstruction of continuous language characterized by inherent character structures. The resulting decoder can produce intelligible textual sequences that faithfully capture the meaning of perceived speech both within and across subjects, while existing decoders exhibit significantly inferior performance in cross-subject contexts. The ability to decode continuous language from single trials across subjects demonstrates the promising applications of non-invasive language brain-computer interfaces in both healthcare and neuroscience.

Decoding Continuous Character-based Language from Non-invasive Brain Recordings

TL;DR

A novel approach to decoding continuous language from single-trial non-invasive fMRI recordings is proposed, in which a three-dimensional convolutional network augmented with information bottleneck is developed to automatically identify responsive voxels to stimuli, and a character-based decoder is designed for the semantic reconstruction of continuous language characterized by inherent character structures.

Abstract

Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to decoding continuous language from single-trial non-invasive fMRI recordings, in which a three-dimensional convolutional network augmented with information bottleneck is developed to automatically identify responsive voxels to stimuli, and a character-based decoder is designed for the semantic reconstruction of continuous language characterized by inherent character structures. The resulting decoder can produce intelligible textual sequences that faithfully capture the meaning of perceived speech both within and across subjects, while existing decoders exhibit significantly inferior performance in cross-subject contexts. The ability to decode continuous language from single trials across subjects demonstrates the promising applications of non-invasive language brain-computer interfaces in both healthcare and neuroscience.
Paper Structure (1 section, 8 equations, 8 figures, 6 tables)

This paper contains 1 section, 8 equations, 8 figures, 6 tables.

Table of Contents

  1. Extended Data

Figures (8)

  • Figure 1: Character-based language decoder. (a) BOLD fMRI responses were recorded from 20 subjects during a 2.7-hour of listening to naturally spoken narratives. A neural network encoder was learned for each subject (in a within-subject setting) or multiple subjects (in a cross-subject setting) to extract the feature representations from fMRI brain images. Those feature representations are expected to match the semantic features of the stimulus character sequence. (b) To reconstruct continuous language from novel brain recordings, the decoder maintains a set of candidate character sequences. As new characters are detected, a large language model (LLM) proposes continuations for each sequence, and the encoder estimates the likelihood of each proposed sequences based on the recorded brain responses. The most likely continuations are retained until no further characters are detected. (c) A three-dimensional (3D) convolutional neural network enhanced with an information bottleneck (IB) was developed to extract semantic features from fMRI brain images. The 3D convolutions were used to enhance invariance to distortions in fMRI images, and the IB was introduced to maximize the predictive power of the extracted feature representations by the network while mitigating the inclusion of irrelevant and noisy information in BOLD fMRI recordings.
  • Figure 2: Decoders were evaluated on single-trial brain responses recorded while the subjects listened to the test articles, which were not used in model training. The comparisons between decoder predictions and the actual stimuli are shown for both within-subject and cross-subject settings. The examples were manually selected and annotated to demonstrate typical decoder behaviors. The decoder can exactly reproduce some characters, words and phrases, as well as effectively grasp the similar meanings of many more. English translations corresponding to the actual stimuli and decoded texts are provided below for reference.
  • Figure 3: Language similarity scores in a within-subject setting. (a) Decoder predictions for a test article (2,040 characters) exhibited significantly greater similarity to the actual stimulus character sequence than both the baseline and expected by chance ($P < 0.05$ for all subjects, one-sided non-parametric test) across all language similarity metrics. To compare across metrics, results are shown as deviations away from the mean of the null distribution (Methods). (b) The actual distribution of character rate for a test article (top-left corner) alongside three distributions of character rates predicted for different subjects. Each subject's character rate model was independently trained and evaluated by predicting the character rate of a test article, and then assessing the linear correlation between the predicted and actual character rate distributions. Predicted distributions were significantly higher than expected by chance ($P < 0.05$, one-sided non-parametric test). (c) Decoding scores were significantly higher than expected by chance ($P < 0.05$, one-sided non-parametric test) for most timepoints under the SBERT metric. (d) Circle size is proportional to the average window similarity between the feature vectors extracted by 3dC-IB from brain responses and the semantic representations of a test article. The degree of similarity closely aligns with both SBERT and BERT scores.
  • Figure 4: Language similarity scores in a cross-subject setting. (a) Decoder predictions for a test article (2,040 characters) were significantly more similar to the actual stimulus character sequence than both the baseline and expected by chance ($P < 0.05$ for all subjects, one-sided non-parametric test) across all language similarity metrics. To compare across metrics, results are shown as deviations away from the mean of the null distribution (Methods). (b) The actual distribution of character rate for a test article (top-left corner) alongside three predicted distributions of character rates. Each subject's character rate model was independently trained and evaluated by predicting the character rate of a test article and assessing the linear correlation between the predicted and the actual character rate distributions. Predicted distributions were significantly higher than expected by chance ($P < 0.05$, one-sided non-parametric test). Cross-subject character rate models performed slightly worse than within-subject character rate models. (c) Decoding scores were significantly higher than expected by chance ($P < 0.05$, one-sided non-parametric test) for most timepoints under the SBERT metric. (d) Employing the 3dC-IB model, decoder predictions for a test article in a cross-subject setting were significantly more similar to the actual stimulus character sequence than expected by chance ($P < 0.05$ for all subjects, one-sided non-parametric test), demonstrating a performance comparable to that observed in a within-subject setting.
  • Figure 5: Identified cortical regions. (a) The identified $10$ most significantly contributing cortical regions engaged in language and semantic processing for each subject, with the numeric codes indicating cortical regions defined in the Automated Anatomical Labeling (AAL) template. (b) Spearman’s correlation coefficients calculated for the brain contribution patterns between each pair of subjects, which reveals a notable uniformity in the patterns of brain contribution across subjects. (c) The top $10$ identified cortical regions and their relative importance for six subjects. The contribution value of the least contributing cortical region was used as a reference point and set to one unit. (d) A typical fMRI recording, as well as the identified regions highlighted using varying shades of color based on their contributions.
  • ...and 3 more figures