Table of Contents
Fetching ...

Neural Encoding and Decoding at Scale

Yizi Zhang, Yanchen Wang, Mehdi Azabou, Alexandre Andre, Zixuan Wang, Hanrui Lyu, The International Brain Laboratory, Eva Dyer, Liam Paninski, Cole Hurwitz

TL;DR

Neural Encoding and Decoding at Scale (NEDS) introduces a unified multimodal, multi-task transformer that learns bidirectional relations between neural activity and behavior by applying a multi-task-masking strategy. Trained on the International Brain Laboratory’s trial-aligned Neuropixels dataset across 83 mice, NEDS demonstrates state-of-the-art encoding and decoding when pretrained on multi-animal data and fine-tuned on new animals, while revealing emergent neuron embeddings that predict brain regions without explicit supervision. The work advances a foundation-model-like framework for brain data, showing scalable improvements with cross-animal pretraining and highlighting the potential for translating neural activity to behavior and vice versa. Limitations include reliance on trial-aligned data and substantial compute requirements, with future work targeting unaligned data and additional modalities to further generalize the approach.

Abstract

Recent work has demonstrated that large-scale, multi-animal models are powerful tools for characterizing the relationship between neural activity and behavior. Current large-scale approaches, however, focus exclusively on either predicting neural activity from behavior (encoding) or predicting behavior from neural activity (decoding), limiting their ability to capture the bidirectional relationship between neural activity and behavior. To bridge this gap, we introduce a multimodal, multi-task model that enables simultaneous Neural Encoding and Decoding at Scale (NEDS). Central to our approach is a novel multi-task-masking strategy, which alternates between neural, behavioral, within-modality, and cross-modality masking. We pretrain our method on the International Brain Laboratory (IBL) repeated site dataset, which includes recordings from 83 animals performing the same visual decision-making task. In comparison to other large-scale models, we demonstrate that NEDS achieves state-of-the-art performance for both encoding and decoding when pretrained on multi-animal data and then fine-tuned on new animals. Surprisingly, NEDS's learned embeddings exhibit emergent properties: even without explicit training, they are highly predictive of the brain regions in each recording. Altogether, our approach is a step towards a foundation model of the brain that enables seamless translation between neural activity and behavior.

Neural Encoding and Decoding at Scale

TL;DR

Neural Encoding and Decoding at Scale (NEDS) introduces a unified multimodal, multi-task transformer that learns bidirectional relations between neural activity and behavior by applying a multi-task-masking strategy. Trained on the International Brain Laboratory’s trial-aligned Neuropixels dataset across 83 mice, NEDS demonstrates state-of-the-art encoding and decoding when pretrained on multi-animal data and fine-tuned on new animals, while revealing emergent neuron embeddings that predict brain regions without explicit supervision. The work advances a foundation-model-like framework for brain data, showing scalable improvements with cross-animal pretraining and highlighting the potential for translating neural activity to behavior and vice versa. Limitations include reliance on trial-aligned data and substantial compute requirements, with future work targeting unaligned data and additional modalities to further generalize the approach.

Abstract

Recent work has demonstrated that large-scale, multi-animal models are powerful tools for characterizing the relationship between neural activity and behavior. Current large-scale approaches, however, focus exclusively on either predicting neural activity from behavior (encoding) or predicting behavior from neural activity (decoding), limiting their ability to capture the bidirectional relationship between neural activity and behavior. To bridge this gap, we introduce a multimodal, multi-task model that enables simultaneous Neural Encoding and Decoding at Scale (NEDS). Central to our approach is a novel multi-task-masking strategy, which alternates between neural, behavioral, within-modality, and cross-modality masking. We pretrain our method on the International Brain Laboratory (IBL) repeated site dataset, which includes recordings from 83 animals performing the same visual decision-making task. In comparison to other large-scale models, we demonstrate that NEDS achieves state-of-the-art performance for both encoding and decoding when pretrained on multi-animal data and then fine-tuned on new animals. Surprisingly, NEDS's learned embeddings exhibit emergent properties: even without explicit training, they are highly predictive of the brain regions in each recording. Altogether, our approach is a step towards a foundation model of the brain that enables seamless translation between neural activity and behavior.

Paper Structure

This paper contains 27 sections, 2 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Schematic illustration of NEDS . (A) Neural encoding and decoding can be interpreted as modeling the conditional probability distributions between neural activity and behavior schulz2025modeling. In NEDS, we utilize a multi-task-masking approach tay2022ul2zhang2024exploiting to model the conditional expectations of these distributions as well as to encourage cross-modal and within-modality representation learning. This is achieved by alternating between neural, behavioral, within-modality, and cross-modal masking during training. (B) We implement NEDS using a multimodal transformer-based architecture. We utilize modality-specific tokenizers that convert spike counts and continuous behaviors into 20ms temporal tokens and discrete behaviors into sequences of repeated tokens, aligning with the temporal resolution of the continuous data. We then add temporal, modality, and session embeddings to the tokens. We train NEDS by masking out tokens according to the masking schemes from (A) and then predicting them with modality-specific decoders. Our multimodal architecture builds on work from other domains he2022maskedmizrahi20234mfang2024promoting.
  • Figure 2: Quantitative and qualitative evaluation of single-session and multi-session NEDS. (A) We evaluate multi-session NEDS and single-session NEDS models against our linear baselines and the single-session, unimodal variant of NEDS. Our results show that multi-session NEDS consistently outperforms all baselines across all tasks, while single-session NEDS outperforms all baselines except in block decoding. These findings demonstrate the advantages of multimodal training and cross-animal pretraining for neural encoding and decoding. Among the baseline models, RRR has the fewest parameters (1,000 $\sim$ 20,000 on average). Linear models contain approximately 40,000 to 70,000 parameters on average. Both the single-session unimodal and multimodal NEDS share the same transformer encoder size ($\sim$ 3 million parameters). The multi-session NEDS is the largest model with $\sim$ 12 million parameters in its transformer encoder. (B) A scatterplot comparison of multi-session NEDS pretrained on 74 sessions vs. single-session NEDS. Each dot corresponds to an individual session. The green value in the bottom right of each subplot displays the relative improvement of the 74-session NEDS over single-session NEDS. (C) A comparison of the predicted trial-averaged firing rates for single-session and multi-session NEDS against the ground truth trial-averaged spike counts for selected neurons. Predictions from multi-session NEDS more closely matches the ground truth. (D) Each row compares single-session and multi-session NEDS predictions of single-trial variability for a neuron against the ground truth. Single-trial variability is obtained by subtracting the neuron's peristimulus time histogram (PSTH) from its activity in each trial. Only selected trials are shown for visualization purposes. (E, F) The predicted wheel speed and whisker motion energy from both the single-session and multi-session NEDS are shown alongside ground truth behaviors for each trial.
  • Figure 3: Comparing NEDS to POYO+ and NDT2. We compare multi-session NEDS to POYO+ and NDT2 after pretraining on 74 sessions, evaluating all models on neural decoding tasks across 10 held-out sessions. We measure the performance of choice and block decoding with accuracy and the wheel speed and whisker motion energy using single-trial $R^2$. Each dot corresponds to an individual session. The green value in the bottom right of each subplot displays the relative improvement of NEDS over POYO+ and NDT2.
  • Figure 4: Brain region classification with neuron embeddings from NEDS. (A) a UMAP projection of NEDS neuron embeddings (detailed in Section \ref{['sec:neuron_emb']}), color-coded by distinct brain regions. (B) Classification accuracy of brain regions using neuron embeddings obtained from single-session unimodal, multimodal NEDs, and multi-session, mulit-modal NEDS. (C) Confusion matrix showing the brain region classification performance of the neuron embeddings from multi-session NEDS.