Table of Contents
Fetching ...

BrainBERT: Self-supervised representation learning for intracranial recordings

Christopher Wang, Vighnesh Subramaniam, Adam Uri Yaari, Gabriel Kreiman, Boris Katz, Ignacio Cases, Andrei Barbu

TL;DR

The paper tackles the data-hungry nature of decoding intracranial neural signals by introducing BrainBERT, a self-supervised Transformer trained on unannotated SEEG data to learn reusable neural representations.By pretraining with masked spectrogram reconstruction on time-frequency neural representations (STFT or superlets) and a content-aware loss, BrainBERT yields contextual embeddings that improve linear decoding across diverse tasks with far fewer labeled examples.Crucially, BrainBERT generalizes to unseen subjects and electrode configurations, supporting zero-shot or minimal-fine-tuning deployment, and enables novel analyses such as intrinsic dimensionality mapping of brain regions.These findings suggest a path toward more data-efficient, interpretable neural decoding and open opportunities for large-scale, cross-subject brain-modeling akin to language-model advances in NLP.

Abstract

We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, i.e., decoding neural data, with higher accuracy and with much less data by being pretrained in an unsupervised manner on a large corpus of unannotated neural recordings. Our approach generalizes to new subjects with electrodes in new positions and to unrelated tasks showing that the representations robustly disentangle the neural signal. Just like in NLP where one can study language by investigating what a language model learns, this approach opens the door to investigating the brain by what a model of the brain learns. As a first step along this path, we demonstrate a new analysis of the intrinsic dimensionality of the computations in different areas of the brain. To construct these representations, we combine a technique for producing super-resolution spectrograms of neural data with an approach designed for generating contextual representations of audio by masking. In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.

BrainBERT: Self-supervised representation learning for intracranial recordings

TL;DR

The paper tackles the data-hungry nature of decoding intracranial neural signals by introducing BrainBERT, a self-supervised Transformer trained on unannotated SEEG data to learn reusable neural representations.By pretraining with masked spectrogram reconstruction on time-frequency neural representations (STFT or superlets) and a content-aware loss, BrainBERT yields contextual embeddings that improve linear decoding across diverse tasks with far fewer labeled examples.Crucially, BrainBERT generalizes to unseen subjects and electrode configurations, supporting zero-shot or minimal-fine-tuning deployment, and enables novel analyses such as intrinsic dimensionality mapping of brain regions.These findings suggest a path toward more data-efficient, interpretable neural decoding and open opportunities for large-scale, cross-subject brain-modeling akin to language-model advances in NLP.

Abstract

We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, i.e., decoding neural data, with higher accuracy and with much less data by being pretrained in an unsupervised manner on a large corpus of unannotated neural recordings. Our approach generalizes to new subjects with electrodes in new positions and to unrelated tasks showing that the representations robustly disentangle the neural signal. Just like in NLP where one can study language by investigating what a language model learns, this approach opens the door to investigating the brain by what a model of the brain learns. As a first step along this path, we demonstrate a new analysis of the intrinsic dimensionality of the computations in different areas of the brain. To construct these representations, we combine a technique for producing super-resolution spectrograms of neural data with an approach designed for generating contextual representations of audio by masking. In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
Paper Structure (44 sections, 11 equations, 13 figures, 6 tables, 1 algorithm)

This paper contains 44 sections, 11 equations, 13 figures, 6 tables, 1 algorithm.

Figures (13)

  • Figure 1: (a) Locations of intracranial electrodes (yellow dots) projected onto the surface of the brain across all subjects for each hemisphere. (b) Subjects watched movies while neural data was recorded (bottom, example electrode trace). (c) Neural recordings were converted to spectrograms which are embedded with BrainBERT. The resulting spectrograms are useful for many downstream tasks, like sample-efficient classification. BrainBERT can be used off-the-shelf, zero-shot, or if data is available, by fine-tuning for each subject and/or task. (d) During pretraining, BrainBERT is optimized to produce embeddings that enable reconstruction of a masked spectrogram, for which it must learn to infer the masked neural activity from the surrounding context.
  • Figure 2: BrainBERT can be trained to either use spectrograms computed by a traditional method, such as the short-time Fourier Transform (top left), or modern methods designed for neural data, such as the superlet transform (bottom left). Shown above are spectrograms from a single electrode over a 5s interval. Superlets provide superresolution by compositing together Morlet wavelet transforms across a range of orders. As in Liu2021, we mask multiple continuous bands of random frequencies and time intervals (top right, red horizontal and vertical rectangles). Since the temporal resolution of superlets falls off as the inverse function of frequency (bottom right), we adopt a masking strategy that reflects this.
  • Figure 3: Using a linear decoder for classifying sentence onsets either (left) directly with the neural recordings or (right) with BrainBERT (superlet input) embeddings. Each circle denotes a different electrode. The color shows the classification performance (see color map on right). Electrodes are shown on the left or right hemispheres. Chance has AUC of 0.5. Only the 947 held-out electrodes are shown. Using BrainBERT highlights far more relevant electrodes, provides much better decoding accuracy, and more convincingly identifies language-related regions in the superior temporal and frontal regions.
  • Figure 4: BrainBERT can be used off-the-shelf for new experiments with new subjects that have new electrode locations. The performance of BrainBERT does not depend on the subject data being seen during pretraining. We show AUC averaged across the four decoding tasks (\ref{['all_models_results']}), in each case finetuning BrainBERT's weights and training a linear decoder. Ten held-out electrodes were chosen from the held-out subject's data. As before, these electrodes have the highest linear decoding accuracy on the original data without BrainBERT. The first two columns in each group show BrainBERT decoding results when a given subject is included in the pretraining set (blue), and when that subject is held out (orange). The performance difference between the two is negligible, and both significantly outperform the linear decoding baseline (green), showing that BrainBERT is robust and can be used off the shelf. Error bars show a 95% confidence interval over the ten electrodes.
  • Figure 5: BrainBERT not only improves decoding accuracy, but it does so with far less data than other approaches. Performance on sentence onset classification is shown for an electrode in the superior temporal gyrus (red dot in brain inset). Error bars show standard deviation over 3 random seeds. Linear decoders (blue) saturate quickly; deep neural networks (green, 5 FF layers, details in text) perform much better but they lose explainability. BrainBERT without fine tuning matches the performance of deep networks, without needing to learn new non-linearities. With fine-tuning, BrainBERT significant outperforms, and it does so with 1/5th as many examples (the deep NN peak at 1,000 examples is exceeded with only 150 examples). This is a critical enabling step for other analyses where subjects may participate in only a few dozen trials as well as for BCI.
  • ...and 8 more figures