Modeling Dynamic Neural Activity by combining Naturalistic Video Stimuli and Stimulus-independent Latent Factors
Finn Schmidt, Polina Turishcheva, Suhas Shrinivasan, Fabian H. Sinz
TL;DR
This work addresses the challenge of predicting dynamic neural activity by jointly modeling external video stimuli and internal brain states. It introduces a probabilistic latent-state model that infers a stimulus-independent latent z from a subset of neurons and integrates it with a video-driven core to predict time-varying neural responses via a Zero-Inflated Gamma distribution. Across SENSORIUM mouse V1 data, the latent model outperforms video-only approaches in likelihood and shows that learned latent factors strongly correlate with behavior and exhibit topographic cortical organization, even without behavioral or anatomical inputs during training. The results demonstrate that unsupervised latent factor learning can reveal meaningful structure linking sensory processing and behavior, with potential to generalize to unseen neurons using cortical coordinates and to other core architectures.
Abstract
The neural activity in the visual processing is influenced by both external stimuli and internal brain states. Ideally, a neural predictive model should account for both of them. Currently, there are no dynamic encoding models that explicitly model a latent state and the entire neuronal response distribution. We address this gap by proposing a probabilistic model that predicts the joint distribution of the neuronal responses from video stimuli and stimulus-independent latent factors. After training and testing our model on mouse V1 neuronal responses, we find that it outperforms video-only models in terms of log-likelihood and achieves improvements in likelihood and correlation when conditioned on responses from other neurons. Furthermore, we find that the learned latent factors strongly correlate with mouse behavior and that they exhibit patterns related to the neurons' position on the visual cortex, although the model was trained without behavior and cortical coordinates. Our findings demonstrate that unsupervised learning of latent factors from population responses can reveal biologically meaningful structure that bridges sensory processing and behavior, without requiring explicit behavioral annotations during training.
