Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

Rabia Gondur; Usama Bin Sikandar; Evan Schaffer; Mikio Christian Aoi; Stephen L Keeley

Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

Rabia Gondur, Usama Bin Sikandar, Evan Schaffer, Mikio Christian Aoi, Stephen L Keeley

TL;DR

The paper addresses the challenge of jointly modeling high-dimensional neural activity and concurrent behavioral data by learning temporally evolving latent variables that can be shared across modalities or modality-specific. It introduces MM-GPVAE, which combines GPFA-like neural decoding with GP-prior latent dynamics and adds a Fourier-domain representation to improve identifiability of latent structure. The approach is validated on simulated rotating/scaling MNIST data with Poisson neural counts and on real multi-modal datasets from Drosophila and Manduca sexta, demonstrating accurate recovery of latent structure and improved cross-modal reconstructions, along with interpretable loadings of neurons to shared versus independent components. Overall, MM-GPVAE offers a flexible, end-to-end framework for multi-modal time-series neuroscience data with potential for broader applications in domains with smoothly evolving latent dynamics.

Abstract

Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically only designed for a single type of data, making it difficult to identify structure shared across different experimental data modalities. Here, we address this shortcoming by proposing an unsupervised LVM which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities. We do this by combining Gaussian Process Factor Analysis (GPFA), an interpretable LVM for neural spiking data with temporally smooth latent space, with Gaussian Process Variational Autoencoders (GP-VAEs), which similarly use a GP prior to characterize correlations in a latent space, but admit rich expressivity due to a deep neural network mapping to observations. We achieve interpretability in our model by partitioning latent variability into components that are either shared between or independent to each modality. We parameterize the latents of our model in the Fourier domain, and show improved latent identification using this approach over standard GP-VAE methods. We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time. We show that the multi-modal GP-VAE (MM-GPVAE) is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials. Finally, we demonstrate our framework on two real world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus.

Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

TL;DR

Abstract

Paper Structure (11 sections, 13 equations, 21 figures)

This paper contains 11 sections, 13 equations, 21 figures.

Introduction
The Gaussian Process Variational Autoencoder
Fourier-domain representation of the GP-VAE
The Multi-Modal Gaussian Process Variational Autoencoder
Experiments
Simulated data
Application to fly experimental data
Application to moth experimental data
Conclusion
Reproducibility Statement
Ethics Statement

Figures (21)

Figure 1: (a) schematic for the Fourier domain GP-VAE. All images at all timepoints for a given trial are encoded via a deep neural network into variational parameters of a pruned Fourier representation of the latent space. This Fourier representation is then mapped back into the time domain before being passed through a decoder network to give the image reconstruction at each timepoint. (b) Image reconstructions of the standard VAE, GP-VAE and Fourier domain GP-VAE. (c) Estimated latent for each model alongside the true underlying latent angle. (d) Mean squared error (MSE) of estimated latents and true latents for 60 held-out trials. Error bars indicate standard error.
Figure 2: (a) Graphical model of the multimodal GP-VAE. A set of Fourier frequencies describe the Fourier representation of shared and independent latents across modality with a GP prior over each latent. Latents are transformed to the time domain and combined to generate data for each modality. (b) Schematic of the MM-GPVAE.
Figure 3: (a) True and estimated latents for the MM-GPVAE trained on simulated neural spiking data as well as a smoothly scaling and rotating MNIST digit. (b) Estimated neural rates on an example trial for 3 example neurons as well as 4 example reconstructed images of the MNIST digit at different angles and scales. (c) (left) Reconstruction accuracy from the image data trained on the images alone (GPVAE) compared to training with both modalities simultaneously (MM-GPVAE). (middle) Accuracy of estimated neural rates (left) trained on neural activity alone (Poisson - GPFA) compared to MM-GPVAE. (right). Accuracy of shared latent estimated from the MM-GPVAE compared to single-modality model variants. Error bars are standard error.
Figure 4: (a) Limb position tracking in Drosophila (b) (top) Contribution of the variability in the data across trials of the neural (left) and behavioral (right) modalities due to shared and independent subspaces (c) (top) Limb position estimates and true values and (bottom) six randomly selected calcium trace estimates and true values for a given trial. (d) Visualization of average latent value across time in the neural, behavioral and independent subspaces for 3 behavioral categories.
Figure 5: (a) Experimental set-up for Manduca sexta. Spikes from 10 muscle groups are recorded as an animal tracks a 1 Hz moving flower stimulus (b) top: Spikes from 3 example motor neurons middle: estimated Poisson rates bottom: The neural latent along torque measurement from the hawkmoth. The $\sim$22Hz modulation reflects wing-flapping. (c) Visual stimulus and reconstructions from MM-GPVAE (d) Weights of hawkmoth spike decoder for neural-only and shared latents. (e) A one-dimensional latent from the visual-modality subspace closely tracks the stimulus position. (f) The shared latent between modalities plotted alongside the torque measurement.
...and 16 more figures

Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

TL;DR

Abstract

Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

Authors

TL;DR

Abstract

Table of Contents

Figures (21)