Table of Contents
Fetching ...

Generalizable, real-time neural decoding with hybrid state-space models

Avery Hee-Woon Ryoo, Nanda H. Krishna, Ximeng Mao, Mehdi Azabou, Eva L. Dyer, Matthew G. Perich, Guillaume Lajoie

TL;DR

POSSM introduces a hybrid, real-time neural decoder that pairs spike-tokenization and input-output cross-attention with a recurrent state-space model backbone to enable fast, causal online predictions and robust generalization across sessions, subjects, and tasks. By pretraining on diverse NHP datasets and finetuning with unit identification or full fine-tuning, POSSM achieves competitive accuracy with state-of-the-art Transformers but at substantially lower inference costs, including millisecond-scale latency. The approach demonstrates cross-species transfer to human handwriting and effective decoding of long-context human speech, highlighting the potential to leverage abundant animal data to enhance clinical BCIs. Overall, POSSM offers a scalable, generalizable neural foundation model for real-time neurotechnology applications with practical implications for closed-loop systems and neuroprosthetics.

Abstract

Real-time decoding of neural activity is central to neuroscience and neurotechnology applications, from closed-loop experiments to brain-computer interfaces, where models are subject to strict latency constraints. Traditional methods, including simple recurrent neural networks, are fast and lightweight but often struggle to generalize to unseen data. In contrast, recent Transformer-based approaches leverage large-scale pretraining for strong generalization performance, but typically have much larger computational requirements and are not always suitable for low-resource or real-time settings. To address these shortcomings, we present POSSM, a novel hybrid architecture that combines individual spike tokenization via a cross-attention module with a recurrent state-space model (SSM) backbone to enable (1) fast and causal online prediction on neural activity and (2) efficient generalization to new sessions, individuals, and tasks through multi-dataset pretraining. We evaluate POSSM's decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications, namely handwriting and speech decoding in human subjects. Notably, we demonstrate that pretraining on monkey motor-cortical recordings improves decoding performance on the human handwriting task, highlighting the exciting potential for cross-species transfer. In all of these tasks, we find that POSSM achieves decoding accuracy comparable to state-of-the-art Transformers, at a fraction of the inference cost (up to 9x faster on GPU). These results suggest that hybrid SSMs are a promising approach to bridging the gap between accuracy, inference speed, and generalization when training neural decoders for real-time, closed-loop applications.

Generalizable, real-time neural decoding with hybrid state-space models

TL;DR

POSSM introduces a hybrid, real-time neural decoder that pairs spike-tokenization and input-output cross-attention with a recurrent state-space model backbone to enable fast, causal online predictions and robust generalization across sessions, subjects, and tasks. By pretraining on diverse NHP datasets and finetuning with unit identification or full fine-tuning, POSSM achieves competitive accuracy with state-of-the-art Transformers but at substantially lower inference costs, including millisecond-scale latency. The approach demonstrates cross-species transfer to human handwriting and effective decoding of long-context human speech, highlighting the potential to leverage abundant animal data to enhance clinical BCIs. Overall, POSSM offers a scalable, generalizable neural foundation model for real-time neurotechnology applications with practical implications for closed-loop systems and neuroprosthetics.

Abstract

Real-time decoding of neural activity is central to neuroscience and neurotechnology applications, from closed-loop experiments to brain-computer interfaces, where models are subject to strict latency constraints. Traditional methods, including simple recurrent neural networks, are fast and lightweight but often struggle to generalize to unseen data. In contrast, recent Transformer-based approaches leverage large-scale pretraining for strong generalization performance, but typically have much larger computational requirements and are not always suitable for low-resource or real-time settings. To address these shortcomings, we present POSSM, a novel hybrid architecture that combines individual spike tokenization via a cross-attention module with a recurrent state-space model (SSM) backbone to enable (1) fast and causal online prediction on neural activity and (2) efficient generalization to new sessions, individuals, and tasks through multi-dataset pretraining. We evaluate POSSM's decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications, namely handwriting and speech decoding in human subjects. Notably, we demonstrate that pretraining on monkey motor-cortical recordings improves decoding performance on the human handwriting task, highlighting the exciting potential for cross-species transfer. In all of these tasks, we find that POSSM achieves decoding accuracy comparable to state-of-the-art Transformers, at a fraction of the inference cost (up to 9x faster on GPU). These results suggest that hybrid SSMs are a promising approach to bridging the gap between accuracy, inference speed, and generalization when training neural decoders for real-time, closed-loop applications.

Paper Structure

This paper contains 64 sections, 9 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Existing deep learning models for neural decoding. (a) Recurrent neural networks (RNNs). (b) Attention-based models such as Transformers.
  • Figure 2: An architecture for generalizable, real-time neural decoding.POSSM combines individual spike tokenization azabou2023unified and input-output cross-attention jaegle2022perceiver with a recurrent SSM backbone. In this paper, we typically consider $k=3$ and $T_c=$ 50 ms.
  • Figure 3: Task schematics and outputs. (a) Centre-out (CO) task with a manipulandum. (b) Random target (RT) task with a manipulandum. (c) RT task with a touchscreen. (d) Maze task with a touchscreen. (e) Ground truth vs. predicted behaviour outputs from a held-out CO session. (f) Same as (e) but for an RT session. (g) Human handwriting decoding task. (h) Human speech decoding task.
  • Figure 4: Sample and compute efficiency benchmarking. (a) Results on a held-out CO session from Monkey C perich_miller_2018_dataset. On the left, we show the sample efficiency of adapting a pretrained model versus training from scratch. On the right, we compare training compute efficiency between these two approaches. (b) Same as (a) but for a held-out RT session from Monkey T perich_miller_2018_dataset -- a new subject not seen during training. (c) Comparing model performance and compute efficiency to baseline models. Inference times are computed on a workstation-class GPU (NVIDIA RTX8000). For all these results, we used a GRU backbone for POSSM.
  • Figure 5: Session-wise data splits for training, validation, and testing. Per session (across all datasets), 10% of the trials were used for validation and 20% were used for testing. The remaining data, including inter-trial segments, was used for training.
  • ...and 1 more figures