Table of Contents
Fetching ...

Gradient-flow SDEs have unique transient population dynamics

Vincent Guan, Joseph Janssen, Nicolas Lanzetti, Antonio Terpin, Geoffrey Schiebinger, Elina Robeva

TL;DR

This work tackles the hard problem of identifying both drift and diffusion in gradient-flow SDEs from observed marginals. It proves that joint identifiability is possible if and only if the process is observed away from equilibrium, and further shows that three distinct marginals suffice to identify the true SDE from any countable candidate set. Building on this theory, it introduces nn-APPEX, a tri-level Schrödinger Bridge-based algorithm that jointly learns the gradient-flow drift $- abla \Psi$ and diffusivity $\sigma^2$ by iteratively inferring trajectories, updating drift via neural networks, and re-estimating diffusion from the inferred paths. Empirical results across multiple potentials demonstrate that learning diffusion is critical to unbiased drift estimation, with nn-APPEX outperforming prior SB methods and closely approaching the true SDE from marginals, offering a principled framework for population dynamics inference when diffusion is unknown.

Abstract

Identifying the drift and diffusion of an SDE from its population dynamics is a notoriously challenging task. Researchers in machine learning and single cell biology have only been able to prove a partial identifiability result: for potential-driven SDEs, the gradient-flow drift can be identified from temporal marginals if the Brownian diffusivity is already known. Existing methods therefore assume that the diffusivity is known a priori, despite it being unknown in practice. We dispel the need for this assumption by providing a complete characterization of identifiability: the gradient-flow drift and Brownian diffusivity are jointly identifiable from temporal marginals if and only if the process is observed outside of equilibrium. Given this fundamental result, we propose nn-APPEX, the first Schrödinger Bridge-based inference method that can simultaneously learn the drift and diffusion of gradient-flow SDEs solely from observed marginals. Extensive numerical experiments show that nn-APPEX's ability to adjust its diffusion estimate enables accurate inference, while previous Schrödinger Bridge methods obtain biased drift estimates due to their assumed, and likely incorrect, diffusion.

Gradient-flow SDEs have unique transient population dynamics

TL;DR

This work tackles the hard problem of identifying both drift and diffusion in gradient-flow SDEs from observed marginals. It proves that joint identifiability is possible if and only if the process is observed away from equilibrium, and further shows that three distinct marginals suffice to identify the true SDE from any countable candidate set. Building on this theory, it introduces nn-APPEX, a tri-level Schrödinger Bridge-based algorithm that jointly learns the gradient-flow drift and diffusivity by iteratively inferring trajectories, updating drift via neural networks, and re-estimating diffusion from the inferred paths. Empirical results across multiple potentials demonstrate that learning diffusion is critical to unbiased drift estimation, with nn-APPEX outperforming prior SB methods and closely approaching the true SDE from marginals, offering a principled framework for population dynamics inference when diffusion is unknown.

Abstract

Identifying the drift and diffusion of an SDE from its population dynamics is a notoriously challenging task. Researchers in machine learning and single cell biology have only been able to prove a partial identifiability result: for potential-driven SDEs, the gradient-flow drift can be identified from temporal marginals if the Brownian diffusivity is already known. Existing methods therefore assume that the diffusivity is known a priori, despite it being unknown in practice. We dispel the need for this assumption by providing a complete characterization of identifiability: the gradient-flow drift and Brownian diffusivity are jointly identifiable from temporal marginals if and only if the process is observed outside of equilibrium. Given this fundamental result, we propose nn-APPEX, the first Schrödinger Bridge-based inference method that can simultaneously learn the drift and diffusion of gradient-flow SDEs solely from observed marginals. Extensive numerical experiments show that nn-APPEX's ability to adjust its diffusion estimate enables accurate inference, while previous Schrödinger Bridge methods obtain biased drift estimates due to their assumed, and likely incorrect, diffusion.

Paper Structure

This paper contains 31 sections, 8 theorems, 47 equations, 6 figures, 1 table.

Key Result

Proposition 4.1

If $p_\mathrm{eq}$ is a stationary distribution for the SDE eq:overdamped_langevin_SDE, then it is also a stationary distribution for the "rescaled" SDE for any $\alpha>0$.

Figures (6)

  • Figure 1: The true drift field (a) and estimated drift fields (b)-(d) are shown for the simple example of a Brownian motion, $\mathrm{d}X_t{} = \sqrt{0.2} \mathrm{d}W_t{}$. The current state-of-the-art Schrödinger Bridge method SBIRRshen2025multizhang2024joint presumes prior knowledge ($\hat{\sigma}^2$) of the diffusivity $\sigma^2$ instead of inferring it from data. Figure 1(b) shows that it may wrongly infer a compressive drift force if $\hat{\sigma}^2 > \sigma^2$, while Figure 1(c) shows that it may wrongly infer an expanding drift force if $\hat{\sigma}^2 < \sigma^2$. Figure 1(d) shows that by iteratively learning the diffusion as well as the drift, our method nn-APPEX can accurately infer drift without knowing diffusion a priori.
  • Figure 2: We simulate gradient-flow SDEs from a variety of potentials and provide inference methods with samples from three distinct marginals, initialized from a random Gaussian mixture model. Data for one seed is plotted for the Oakley–O'Hagan potential in (a)–(c), along with the true and estimated landscapes in (d).
  • Figure 3: The ability of different Schrödinger Bridge methods to infer the gradient-flow drift is evaluated across five different potentials using (a) normalized absolute error (lower is better) and (b) cosine similarity (higher is better). Methods are given samples from three distinct marginals, such that the initial distribution is a Gaussian mixture model with randomly initialized components. The box-and-whisker plots aggregated from $10$ seeds show that our method, nn-APPEX, performs the best across all potentials.
  • Figure 4: The ability of different Schrödinger Bridge methods to infer the gradient-flow drift is evaluated across five potentials using (a) normalized absolute error and (b) cosine similarity. Here, methods observe three marginals with a uniform initial distribution in the region of interest, i.e. $p_0 \sim \mathrm{Unif}[-4,4]^2$. Box-and-whisker plots over $10$ seeds show nn-APPEX performs best across all potentials.
  • Figure 5: The ability of different Schrödinger Bridge methods to infer the gradient-flow drift is evaluated across five potentials using (a) normalized absolute error (lower is better) and (b) cosine similarity (higher is better). Methods observe three marginals with the initial distribution equal to the SDE's stationary Gibbs distribution, $p_0 \sim p_{\mathrm{eq}}$. Aggregated box-and-whisker plots over $10$ seeds show that all methods perform similarly poorly, corroborating \ref{['prop:stationary_non_iden']}: the true gradient-flow drift is not identifiable without knowing the diffusivity.
  • ...and 1 more figures

Theorems & Definitions (24)

  • Definition 3.1: Identifiability
  • Example 3.2: Non-identifiability at the stationary distribution
  • Proposition 4.1
  • proof
  • Theorem 4.2: Identifiability of gradient-flow SDEs
  • proof
  • Corollary 4.3: Identifiability from three marginals
  • proof
  • Lemma A.1: Finite-time smoothing of marginals
  • proof
  • ...and 14 more