Table of Contents
Fetching ...

SimSort: A Data-Driven Framework for Spike Sorting by Large-Scale Electrophysiology Simulation

Yimu Zhang, Dongqi Han, Yansen Wang, Zhenning Lv, Yu Gu, Dongsheng Li

TL;DR

SimSort tackles the ground-truth deficit in spike sorting by pretraining a fully automated pipeline on a large-scale, biophysically realistic simulated dataset. A transformer-based spike detector paired with a contrastive-learning–driven spike-identification module learns robust, transferable representations that generalize to real neural recordings without fine-tuning. The approach achieves strong zero-shot performance across multiple benchmarks and real-world data, with further gains from limited fine-tuning and clear scaling behavior as data size grows. This simulation-driven pretraining paradigm offers a scalable, plug-and-play solution for spike sorting in diverse electrophysiology settings.

Abstract

Spike sorting is an essential process in neural recording, which identifies and separates electrical signals from individual neurons recorded by electrodes in the brain, enabling researchers to study how specific neurons communicate and process information. Although there exist a number of spike sorting methods which have contributed to significant neuroscientific breakthroughs, many are heuristically designed, making it challenging to verify their correctness due to the difficulty of obtaining ground truth labels from real-world neural recordings. In this work, we explore a data-driven, deep learning-based approach. We begin by creating a large-scale dataset through electrophysiology simulations using biologically realistic computational models. We then present SimSort, a pretraining framework for spike sorting. Trained solely on simulated data, SimSort demonstrates zero-shot generalizability to real-world spike sorting tasks, yielding consistent improvements over existing methods across multiple benchmarks. These results highlight the potential of simulation-driven pretraining to enhance the robustness and scalability of spike sorting in experimental neuroscience.

SimSort: A Data-Driven Framework for Spike Sorting by Large-Scale Electrophysiology Simulation

TL;DR

SimSort tackles the ground-truth deficit in spike sorting by pretraining a fully automated pipeline on a large-scale, biophysically realistic simulated dataset. A transformer-based spike detector paired with a contrastive-learning–driven spike-identification module learns robust, transferable representations that generalize to real neural recordings without fine-tuning. The approach achieves strong zero-shot performance across multiple benchmarks and real-world data, with further gains from limited fine-tuning and clear scaling behavior as data size grows. This simulation-driven pretraining paradigm offers a scalable, plug-and-play solution for spike sorting in diverse electrophysiology settings.

Abstract

Spike sorting is an essential process in neural recording, which identifies and separates electrical signals from individual neurons recorded by electrodes in the brain, enabling researchers to study how specific neurons communicate and process information. Although there exist a number of spike sorting methods which have contributed to significant neuroscientific breakthroughs, many are heuristically designed, making it challenging to verify their correctness due to the difficulty of obtaining ground truth labels from real-world neural recordings. In this work, we explore a data-driven, deep learning-based approach. We begin by creating a large-scale dataset through electrophysiology simulations using biologically realistic computational models. We then present SimSort, a pretraining framework for spike sorting. Trained solely on simulated data, SimSort demonstrates zero-shot generalizability to real-world spike sorting tasks, yielding consistent improvements over existing methods across multiple benchmarks. These results highlight the potential of simulation-driven pretraining to enhance the robustness and scalability of spike sorting in experimental neuroscience.

Paper Structure

This paper contains 37 sections, 17 equations, 24 figures, 9 tables.

Figures (24)

  • Figure 1: Pipeline of spike sorting, which consists of two main steps: spike detection and spike identification. For spike detection, typical spike sorting algorithms (left) utilize threshold-based detector relies on fixed voltage thresholds on each channel, and use non-learning method like performing PCA-based clustering on concatenated waveforms for spike identification. Those approaches are sensitive to noise and require manual parameter tuning. In contrast, our proposed framework SimSort (right) use a neural network-based detector replaces the threshold method for spike detection, enhancing robustness and generalization. For spike identification, the feature embeddings of multi-channel waveforms learned from contrastive learning to improve clustering accuracy.
  • Figure 2: Overview of the large-scale simulated extracellular dataset generation process. (a) Multi-compartment neuron models simulate detailed neuronal morphologies and electrophysiological properties, incorporating realistic ion channel dynamics. (b) Noise generated by stochastic process is injected into the somatic compartment to induce stochastic intracellular action potential firing, and extracellular signals are recorded using virtual electrodes placed near the neurons in a simulated environment. (c) The resulting multi-channel extracellular recordings capture diverse and realistic neural activity.
  • Figure 3: Spike detection model in SimSort.
  • Figure 4: Spike identification model in SimSort.
  • Figure 5: Evaluation of SimSort on real experimental data (See Fig. \ref{['fig: realdata_sorting_visualization']} for additional recording results). (a) Experimental setup: extracellular recordings from the mouse primary visual cortex during drifting gratings in eight directions. (b) Visualization of SimSort's sorting results. (c) Autocorrelograms and average waveform of each unit. (d) Orientation tuning curves and global orientation selectivity index (gOSI) for each unit.
  • ...and 19 more figures