How Many Bytes Can You Take Out Of Brain-To-Text Decoding?
Richard Antonello, Nihita Sarma, Jerry Tang, Jiaru Song, Alexander Huth
TL;DR
The paper tackles non-invasive brain-to-text decoding from fMRI data by introducing an information-theoretic evaluation framework and two enhancement strategies for Bayesian decoders: Minimum Bayes Risk (MBR) decoding and encoding model scaling. Using a voxel-wise encoding model $\hat{R}(S)$ with Gaussian noise $\Sigma$ and a language-model prior $P(S)$, the authors recover word sequences via beam search and quantify performance with BERTScore and LogRank, while exploring the potential of larger encoders (e.g., Llama-2/Llama-3) and ensemble approaches to extract more information. They report substantial gains (approximately 40–50% in semantic similarity) when combining MBR with a strong encoding model, and demonstrate that the method’s information extraction adheres to Zipfian dynamics with a quantified theoretical ceiling: ideal encoding could add about $2.7$ bits and improved noise modeling about $1.2$ bits. The work suggests that practical non-invasive brain-to-text decoders are within reach with further algorithmic advances, while highlighting compute demands and important mental privacy considerations for real-world deployment.
Abstract
Brain-computer interfaces have promising medical and scientific applications for aiding speech and studying the brain. In this work, we propose an information-based evaluation metric for brain-to-text decoders. Using this metric, we examine two methods to augment existing state-of-the-art continuous text decoders. We show that these methods, in concert, can improve brain decoding performance by upwards of 40% when compared to a baseline model. We further examine the informatic properties of brain-to-text decoders and show empirically that they have Zipfian power law dynamics. Finally, we provide an estimate for the idealized performance of an fMRI-based text decoder. We compare this idealized model to our current model, and use our information-based metric to quantify the main sources of decoding error. We conclude that a practical brain-to-text decoder is likely possible given further algorithmic improvements.
