Latent Diffusion for Neural Spiking Data
Jaivardhan Kapoor, Auguste Schulz, Julius Vetter, Felix Pei, Richard Gao, Jakob H. Macke
TL;DR
LDNS addresses the dual need for revealing low-dimensional neural population structure and generating realistic, behaviorally conditioned spiking data. It achieves this with a two-stage approach: a regularized autoencoder employing structured state-space (S4) layers to produce time-aligned latent trajectories $\mathbf{z} \in \mathbb{R}^{d\times T}$, and a conditional diffusion model operating in latent space to sample $\mathbf{z}^*$ with variable length and covariate conditioning. An expressive, autoregressive spike-history observation model augments the Poisson likelihood to capture single-neuron dynamics without perturbing latent dynamics, improving realism. The method is validated on synthetic Lorenz dynamics and real datasets from human cortex during attempted speech and monkey reach tasks, showing faithful reproduction of spike-count distributions, inter-spike-interval statistics, and population correlations, and enabling conditional generation on reach direction and velocity profiles. LDNS thus provides a practical, modular framework for simultaneous latent inference and high-fidelity generative modeling of neural spiking data, with potential for closed-loop in silico experiments and hypothesis testing, while acknowledging latent-dimension selection and privacy considerations for synthetic data.
Abstract
Modern datasets in neuroscience enable unprecedented inquiries into the relationship between complex behaviors and the activity of many simultaneously recorded neurons. While latent variable models can successfully extract low-dimensional embeddings from such recordings, using them to generate realistic spiking data, especially in a behavior-dependent manner, still poses a challenge. Here, we present Latent Diffusion for Neural Spiking data (LDNS), a diffusion-based generative model with a low-dimensional latent space: LDNS employs an autoencoder with structured state-space (S4) layers to project discrete high-dimensional spiking data into continuous time-aligned latents. On these inferred latents, we train expressive (conditional) diffusion models, enabling us to sample neural activity with realistic single-neuron and population spiking statistics. We validate LDNS on synthetic data, accurately recovering latent structure, firing rates, and spiking statistics. Next, we demonstrate its flexibility by generating variable-length data that mimics human cortical activity during attempted speech. We show how to equip LDNS with an expressive observation model that accounts for single-neuron dynamics not mediated by the latent state, further increasing the realism of generated samples. Finally, conditional LDNS trained on motor cortical activity during diverse reaching behaviors can generate realistic spiking data given reach direction or unseen reach trajectories. In summary, LDNS simultaneously enables inference of low-dimensional latents and realistic conditional generation of neural spiking datasets, opening up further possibilities for simulating experimentally testable hypotheses.
