Table of Contents
Fetching ...

Decoding individual words from non-invasive brain recordings across 723 participants

Stéphane d'Ascoli, Corentin Bel, Jérémy Rapin, Hubert Banville, Yohann Benchetrit, Christophe Pallier, Jean-Rémi King

TL;DR

The study tackles decoding individual words from non-invasive brain recordings (EEG/MEG) at scale, using a large, multilingual dataset. It introduces a deep learning pipeline that maps brain activity to semantic word representations via pretrained language-model embeddings, trained with CLIP-style objectives and a deduplicated SigLIP variant. On 723 participants across diverse devices and languages, the approach achieves robust word decoding, with MEG and reading conditions yielding the strongest performance and significant gains from increasing training data and averaging predictions. Analyses reveal that decoded words reflect both semantic content and sublexical cues such as part-of-speech and word length, informing theories of neural language representation and highlighting practical directions for non-invasive brain-to-text interfaces. The work outlines concrete paths and remaining challenges for translating non-invasive word decoding into real-time, natural-language BCIs, while contributing to our understanding of how semantic and syntactic information are represented in the brain.

Abstract

Deep learning has recently enabled the decoding of language from the neural activity of a few participants with electrodes implanted inside their brain. However, reliably decoding words from non-invasive recordings remains an open challenge. To tackle this issue, we introduce a novel deep learning pipeline to decode individual words from non-invasive electro- (EEG) and magneto-encephalography (MEG) signals. We train and evaluate our approach on an unprecedentedly large number of participants (723) exposed to five million words either written or spoken in English, French or Dutch. Our model outperforms existing methods consistently across participants, devices, languages, and tasks, and can decode words absent from the training set. Our analyses highlight the importance of the recording device and experimental protocol: MEG and reading are easier to decode than EEG and listening, respectively, and it is preferable to collect a large amount of data per participant than to repeat stimuli across a large number of participants. Furthermore, decoding performance consistently increases with the amount of (i) data used for training and (ii) data used for averaging during testing. Finally, single-word predictions show that our model effectively relies on word semantics but also captures syntactic and surface properties such as part-of-speech, word length and even individual letters, especially in the reading condition. Overall, our findings delineate the path and remaining challenges towards building non-invasive brain decoders for natural language.

Decoding individual words from non-invasive brain recordings across 723 participants

TL;DR

The study tackles decoding individual words from non-invasive brain recordings (EEG/MEG) at scale, using a large, multilingual dataset. It introduces a deep learning pipeline that maps brain activity to semantic word representations via pretrained language-model embeddings, trained with CLIP-style objectives and a deduplicated SigLIP variant. On 723 participants across diverse devices and languages, the approach achieves robust word decoding, with MEG and reading conditions yielding the strongest performance and significant gains from increasing training data and averaging predictions. Analyses reveal that decoded words reflect both semantic content and sublexical cues such as part-of-speech and word length, informing theories of neural language representation and highlighting practical directions for non-invasive brain-to-text interfaces. The work outlines concrete paths and remaining challenges for translating non-invasive word decoding into real-time, natural-language BCIs, while contributing to our understanding of how semantic and syntactic information are represented in the brain.

Abstract

Deep learning has recently enabled the decoding of language from the neural activity of a few participants with electrodes implanted inside their brain. However, reliably decoding words from non-invasive recordings remains an open challenge. To tackle this issue, we introduce a novel deep learning pipeline to decode individual words from non-invasive electro- (EEG) and magneto-encephalography (MEG) signals. We train and evaluate our approach on an unprecedentedly large number of participants (723) exposed to five million words either written or spoken in English, French or Dutch. Our model outperforms existing methods consistently across participants, devices, languages, and tasks, and can decode words absent from the training set. Our analyses highlight the importance of the recording device and experimental protocol: MEG and reading are easier to decode than EEG and listening, respectively, and it is preferable to collect a large amount of data per participant than to repeat stimuli across a large number of participants. Furthermore, decoding performance consistently increases with the amount of (i) data used for training and (ii) data used for averaging during testing. Finally, single-word predictions show that our model effectively relies on word semantics but also captures syntactic and surface properties such as part-of-speech, word length and even individual letters, especially in the reading condition. Overall, our findings delineate the path and remaining challenges towards building non-invasive brain decoders for natural language.

Paper Structure

This paper contains 40 sections, 3 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Approach. (A) Each colored disk represents 1 subject (size represents recording time). Our datasets encompasses both public and original M/EEG data of participants reading or listening to Dutch, English or French sentences (Table \ref{['tab:datasets']}). (B) Our deep learning pipeline consists in training, with contrastive learning, an architecture that decodes the semantic representations of words from brain activity, as identified by a pretrained multilingual language model.
  • Figure 2: Decoding performance across model architectures and datasets. A. A linear ridge regression is trained to predict word embeddings from a single slice of M/EEG data, and at each time sample relative to word onset. Decoding is evaluated with the average Pearson correlation on the test set. We fit a different model for each subject, and report the average over all subjects. Each curve is normalized to its peak value, which is explicitly indicated above. B. We compare the accuracy for classic decoding models for each dataset (colors). Horizontal black lines denote the average across datasets for a given model. Stars highlight above chance decoding across participants ($p<0.005$). C. Accuracy of our model for each subject of each dataset, with the average over subjects denoted as horizontal lines. D. Accuracy averaged by recording device, with error bars denoting SEM across subjects. E. Accuracy averaged by task, with error bars denoted SEM across subjects. We focus on MEG datasets that had the same sentences in the reading and listening condition. F. Accuracy compared to the total recording duration of each dataset. G. Accuracy compared to the average recording duration per subject. The log-linear fit yields $p<0.05$.
  • Figure 3: Scaling laws for decoding performance. A. Balanced top-10 accuracy as a function of the number of subjects used for training. Shaded regions indicate SEM across the subjects. B. Balanced top-10 accuracy as a function of the number of occurrences of each word averaged before scoring. Shaded regions indicate standard deviation over the sampling of the occurrences. C. Comparison of averages within a given context and across $N$ subjects versus averages within a given subject and across $N$ contexts for that subject, as illustrated by the sketch on the right. As this necessitates both the presence of many repetitions across subjects and contexts, we focus on the Accou and LittlePrince datasets which match these constraints, and consider the 50 most frequent words. Shaded regions indicate standard deviation over the sampling of the occurrences.
  • Figure 4: Examples of top-10 predictions for two MEG datasets. The y-axis indicates the 10 most likely words given MEG activity. Horizontal bars represent the words $Y_j$ with highest cosine similarity to the decoding prediction $\hat{Y}_i$: $\mathbb E \left[ \measuredangle(\hat{Y}_i, Y_j) \right]$, where $\mathbb E$ denotes an average over all the recordings in the test set corresponding to word $i$. The colorscale indicates the true cosine similarity, i.e. $\measuredangle(Y_i, Y_j)$. Overall, many decoding predictions seem semantically similar to the words actually presented to the subjects. More examples are displayed in \ref{['fig:full_predictions']}.
  • Figure 5: Impact of sublexical and syntactic features on decoding. For each incorrectly predicted word in the test set, we evaluate whether various properties of the top-1 prediction match those of the target word. A-B. Results for length and part-of-speech, where the stimuli are identical between the listening and reading tasks. Error bars denote SEM across subjects. C. Results for all properties (first and last letter, length and part-of-speech) across all datasets, with error bars denoting SEM across subjects. Stars indicate significantly above chance classification ($p<0.005$, one-sided t-test).
  • ...and 5 more figures