How Many Bytes Can You Take Out Of Brain-To-Text Decoding?

Richard Antonello; Nihita Sarma; Jerry Tang; Jiaru Song; Alexander Huth

How Many Bytes Can You Take Out Of Brain-To-Text Decoding?

Richard Antonello, Nihita Sarma, Jerry Tang, Jiaru Song, Alexander Huth

TL;DR

The paper tackles non-invasive brain-to-text decoding from fMRI data by introducing an information-theoretic evaluation framework and two enhancement strategies for Bayesian decoders: Minimum Bayes Risk (MBR) decoding and encoding model scaling. Using a voxel-wise encoding model $\hat{R}(S)$ with Gaussian noise $\Sigma$ and a language-model prior $P(S)$, the authors recover word sequences via beam search and quantify performance with BERTScore and LogRank, while exploring the potential of larger encoders (e.g., Llama-2/Llama-3) and ensemble approaches to extract more information. They report substantial gains (approximately 40–50% in semantic similarity) when combining MBR with a strong encoding model, and demonstrate that the method’s information extraction adheres to Zipfian dynamics with a quantified theoretical ceiling: ideal encoding could add about $2.7$ bits and improved noise modeling about $1.2$ bits. The work suggests that practical non-invasive brain-to-text decoders are within reach with further algorithmic advances, while highlighting compute demands and important mental privacy considerations for real-world deployment.

Abstract

Brain-computer interfaces have promising medical and scientific applications for aiding speech and studying the brain. In this work, we propose an information-based evaluation metric for brain-to-text decoders. Using this metric, we examine two methods to augment existing state-of-the-art continuous text decoders. We show that these methods, in concert, can improve brain decoding performance by upwards of 40% when compared to a baseline model. We further examine the informatic properties of brain-to-text decoders and show empirically that they have Zipfian power law dynamics. Finally, we provide an estimate for the idealized performance of an fMRI-based text decoder. We compare this idealized model to our current model, and use our information-based metric to quantify the main sources of decoding error. We conclude that a practical brain-to-text decoder is likely possible given further algorithmic improvements.

How Many Bytes Can You Take Out Of Brain-To-Text Decoding?

TL;DR

with Gaussian noise

and a language-model prior

, the authors recover word sequences via beam search and quantify performance with BERTScore and LogRank, while exploring the potential of larger encoders (e.g., Llama-2/Llama-3) and ensemble approaches to extract more information. They report substantial gains (approximately 40–50% in semantic similarity) when combining MBR with a strong encoding model, and demonstrate that the method’s information extraction adheres to Zipfian dynamics with a quantified theoretical ceiling: ideal encoding could add about

bits and improved noise modeling about

bits. The work suggests that practical non-invasive brain-to-text decoders are within reach with further algorithmic advances, while highlighting compute demands and important mental privacy considerations for real-world deployment.

Abstract

Paper Structure (18 sections, 3 equations, 9 figures, 1 table)

This paper contains 18 sections, 3 equations, 9 figures, 1 table.

Introduction
Methods
Bayesian Decoding
Evaluating Decoding Models
Techniques for Improving Decoding
Minimum Bayes Risk Decoding
Encoding Model Scaling
MRI data
Estimating an idealized semantic decoding model
Compute specifications
Results
Improving Decoding Performance
Estimating Idealized Decoding Models
Discussion
Appendix / supplemental material
...and 3 more sections

Figures (9)

Figure 1: Decoding methods. (a) Subjects listen to natural speech while blood-oxygen-level-dependent (BOLD) brain responses are recorded using fMRI. Encoding models use linear regression to predict BOLD responses from features extracted from the stimuli using Llama. (b) A Bayesian decoder uses the encoding model to reconstruct stimulus words from fMRI data. A beam search is performed over word sequences, with candidate beam continuations sampled from GPT-1. The probability of observing the brain responses given the proposed sequences is then evaluated using the encoding model. The best candidate sequences are preserved for the next step. (c) Decoding performance can be improved by ensembling via minimum Bayes risk (MBR). An ensemble of encoding models are estimated by sampling the training data, then are used to decode word sequences. A second beam search then finds a word sequence that is maximally similar to all the ensemble decoded sequences.
Figure 2: Comparison of Decoding Models: (a) Comparison of Llama-2 with MBR over GPT-1 Baseline: Semantic similarity of each model output with the ground truth is plotted on the y-axis over the timecourse of a held-out test story. The Llama-2 model with MBR is shown to outperform GPT-1 in almost all cases. (b) Computational Costs of Decoding: A log-plot comparing computational cost aginst estimated performance. A clear cost-performance tradeoff is visualized. (c) Number of Bits Extracted: The estimated amount of information extracted using the $P(R|S)$ approxmiators built from each model is visualized. Llama-2 extracts about 2.8 more bits of information, after Pareto correction. The number of bits is estimated by the number of distractors that are ranked better than the ground truth. (d) Qualitative Performance Comparison: Two well-decoded extended contexts from each decoding model (Llama-2 with a 50 run MBR ensemble, GPT-1) are presented. The ground truth text is presented bordered by green. For each decoding, words are colored based on whether they are semantically-correct (blue), gist-capturing (purple), or incorrect (red)
Figure 3: Informatic properties of Bayesian decoding (a) Power law dynamics: Plotted is the proportion of distractors from the total set that are evaluated as having a higher $P(R|S)$ over the ground truth stimuli for S3. The relationship obeys a power law and can be modelled with a Pareto distribution (red line). (b) Idealized encoding performance: Idealized encoding models were computed by averaging responses across different repeats of the test story. With 9 repeats of the test story, identification using averaged responses led to an average 2.7 bits of improvement over the current state-of-the-art encoding model. (c) Idealized noise estimation: Idealized noise models were computed by averaging noise covariance matrices across different repeats of the test story. With 9 repeats of the test story, identification using idealized noise models led to an average 1.2 bits of improvement over the current state-of-the-art noise model. S1 is omitted from these analyses due to poor test story repeatability. These results suggest further room for improvement in $P(R|S)$ estimation.
Figure A1: GPT-1 vs. Llama-2 + MBR (50 run) comparison for S01
Figure A2: GPT-1 vs. Llama-2 + MBR (50 run) comparison for S02
...and 4 more figures

How Many Bytes Can You Take Out Of Brain-To-Text Decoding?

TL;DR

Abstract

How Many Bytes Can You Take Out Of Brain-To-Text Decoding?

Authors

TL;DR

Abstract

Table of Contents

Figures (9)