Deep probabilistic model synthesis enables unified modeling of whole-brain neural activity across individual subjects

William E. Bishop; Luuk W. Hesselink; Bernhard Englitz; Misha B. Ahrens; James E. Fitzgerald

Deep probabilistic model synthesis enables unified modeling of whole-brain neural activity across individual subjects

William E. Bishop, Luuk W. Hesselink, Bernhard Englitz, Misha B. Ahrens, James E. Fitzgerald

Abstract

Many disciplines need quantitative models that synthesize experimental data across multiple instances of the same general system. For example, neuroscientists must combine data from the brains of many individual animals to understand the species' brain in general. However, typical machine learning models treat one system instance at a time. Here we introduce a machine learning framework, deep probabilistic model synthesis (DPMS), that leverages system properties auxiliary to the model to combine data across system instances. DPMS specifically uses variational inference to learn a conditional prior distribution and instance-specific posterior distributions over model parameters that respectively tie together the system instances and capture their unique structure. DPMS can synthesize a wide variety of model classes, such as those for regression, classification, and dimensionality reduction, and we demonstrate its ability to improve upon single-instance models on synthetic data and whole-brain neural activity data from larval zebrafish.

Deep probabilistic model synthesis enables unified modeling of whole-brain neural activity across individual subjects

Abstract

Paper Structure (54 sections, 3 theorems, 55 equations, 10 figures, 3 tables)

This paper contains 54 sections, 3 theorems, 55 equations, 10 figures, 3 tables.

Introduction
Results
Theoretical framework
Illustrating DPMS through a synthetic example
Synthesizing regression models for decoding behavior from neural population activity
Synthesizing dimensionality reduction models to find a shared latent space across individuals and experimental conditions
Discussion
Methods
Accommodating different dimensionalities across system instances
Approximating theoretical expectations with numerical sampling
Synthesizing dimensionality reduction models
Sum of hyperrectangular basis functions (SHBF) functions
Relating only some model parameters through system properties
Application details for DPMS
Application details for Section \ref{['sec:syn_example']}
...and 39 more sections

Key Result

Lemma 1

Let $q^s(\theta^s)$ for $s=1, \ldots, S$ be a finite set of probability density functions with finite entropy over the continuous random variables $\theta^s \in \mathbb{R}^{d_\theta \times m}$. Then $\sum_{s=1}^S\text{KL}\left[q^s(\theta^s) || p(\theta^s)\right]$ is minimized with respect to $p(\the

Figures (10)

Figure 1: Probabilistic Model Synthesis enables structure learned from one system instance to be transferred to another. (a) DPMS synthesizes models across system instances (e.g., individual animals) by predicting model parameters for each instance from auxiliary properties (e.g., neuron positions) through a shared Conditional Prior Distribution (CPD). The CPD, generally implemented as a deep neural network, models a distribution over model parameters conditioned on these auxiliary properties. This biologically-informed prior for each system instance is then refined using data (e.g., neural activity and behavior) collected from that instance, yielding system-specific posteriors. Both the CPD and the model posteriors are optimized using the data from all individual animals through the Evidence Lower-Bound (ELBO). This enables the CPD to capture a common mapping from a neuron’s properties to its role in shaping dynamics and behavior, while the posteriors account for individual variability. (b) Illustration of probabilistic model synthesis in a scenario where two system instances (blue and red) have the same measurable properties and true model parameters (black star). All probability distributions are assumed to be multivariate normal, with level sets plotted as ellipses. We suppose that data are collected from the two system instances under different conditions, such that the data only constrain one of the two model parameters well for each system (blue and red dashed lines). Learning synthesizes information so that both parameters become well constrained in the CPD (black solid line) and optimal approximate posteriors (blue and red solid lines). Note that each approximate posterior is tighter than the CPD along the dimension well constrained by the data. (c) As described in the text, the optimal CPD (black) is the average of approximate posteriors (red and blue) for system instances with the same measurable properties. Here, $\theta^s$ is one-dimensional, and we show the approximate posteriors (red and blue) for two system instances with measurable properties equal to $m$, and the optimal CPD conditioned on $m$ (black).
Figure 2: Caption on next page.
Figure 3: DPMS applied to a simulated scenario. (a) The structure of the ground-truth simulated brain models generated for each individual. The neuronal activity, $x^s$, is projected into a 1D subspace to form $l^s$ representing the conserved computational quantity the model brain uses to drive behavior. Projection weights vary across neurons and system instances in a way that depends on properties (see panel b). A function, $f$, that is shared across individuals and represents the high-level algorithm the brain uses to drive behavior transforms $l^s$ into behavior. Recorded behavior, $y^s$, is formed by adding recording noise to $f(l^s)$ with a standard deviation $\nu^s$, selected independently from a Gamma distribution across individuals. Only a pseudo-randomly selected portion of $f$ is explored in the training data for each individual. (b) The ground-truth functions, $\mu$ and $\sigma$, specifying the mean and standard deviation of the ground-truth CPD throughout the 2D property space. (c) Ground-truth weights for neurons visualized in property space for two system instances. Neurons in a pseudo-randomly selected half of property space (dashed regions) are silent in the training data for each individual. For visualization, only $10\%$ of the neurons have been been shown for each individual. (d) The functions $\hat{\mu}$ and $\hat{\sigma}$ learned for the CPD by DPMS. (e) The true shared function, $f$, and its estimate from DPMS, $\hat{f}$, shown over the entirety of their domain. The portion of $l^s$ explored for individual 2 is denoted in gray. (f) The posterior mean over weights for active and silent neurons estimated by DPMS for individual 2. Note that $w^2$ in the figure denotes weights for individual 2 and not squared weights. (g-i) Same as panels d-f but fit to individual 2 in isolation. For visualization purposes, estimated standard deviation values have been clipped at $0.40$ (j) Performance, as quantified with the ELBO, of models for all 100 individuals synthesized with DPMS or fit to data from individuals in isolation. The large circles denote performances for individual 2. (k,l) Same as panel j but with performances quantified by R-squared or the correlation between predicted and recorded values of $y^s$. R-squared values have been clipped at $-1$. Notably, out-of-distribution R-squared and correlation values can be higher than in-distribution values as as it covers the whole range of $f$ whereas the ground-truth in-distribution data for any single individual covers only a limited portion. Since both R-squared and correlation measure performance normalized with respect to the range of ground-truth values, errors of similar magnitude will more detrimentally affect these measures for the in-distribution data.
Figure 4: DPMS applied to neural recordings in zebrafish larvae when synthesizing regression models uncovers common structure across individuals. (a) The brain-wide activity of ${\sim}80,000$ neurons per fish in a virtual reality setup was recorded along with the voltage of motor nerves on both sides of a fish's tail, which served as fictive swim signals. (b) Example neural activity and fictive swim signals for one fish. (c) We study the fish under phototaxis, in which they turn towards the brighter half of an arena, which alternated throughout an experiment. (d) The data used for and the results of DPMS. Each base fish has more data than the target fish. Model synthesis produces both a CPD and posteriors for each fish. We focus on posteriors for the target fish. (e-g) Cross-validated results, measured with three metrics, for models for the target fish when models were synthesized with the base fish (solid lines) and fit to the target fish alone (dashed lines). Results showing the average across folds for each individual target fish are shown in the light, colored lines. Averages across fish are shown in the thicker, black lines. (h) Max projections of the means of an example CPD, approximate posterior, and the difference between the two for the weights of neurons projecting to one of the dimensions of the low-d space. Weights are shown for target fish 1 when $100\%$ of the available training data in a fold was used.
Figure 5: Caption on next page.
...and 5 more figures

Theorems & Definitions (6)

Lemma 1
proof
Theorem 1
proof
Corollary 1
proof

Deep probabilistic model synthesis enables unified modeling of whole-brain neural activity across individual subjects

Abstract

Deep probabilistic model synthesis enables unified modeling of whole-brain neural activity across individual subjects

Authors

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (6)