Table of Contents
Fetching ...

MIOFlow 2.0: A unified framework for inferring cellular stochastic dynamics from single cell and spatial transcriptomics data

Xingzhi Sun, João Felipe Rocha, Brett Phelan, Dhananjay Bhaskar, Guillaume Huguet, Yanlei Zhang, D. S. Magruder, Alexander Tong, Ke Xu, Oluwadamilola Fasina, Mark Gerstein, Guy Wolf, Natalia Ivanova, Christine L. Chaffer, Smita Krishnaswamy

Abstract

Understanding cellular trajectories via time-resolved single-cell transcriptomics is vital for studying development, regeneration, and disease. A key challenge is inferring continuous trajectories from discrete snapshots. Biological complexity stems from stochastic cell fate decisions, temporal proliferation changes, and spatial environmental influences. Current methods often use deterministic interpolations treating cells in isolation, failing to capture the probabilistic branching, population shifts, and niche-dependent signaling driving real biological processes. We introduce Manifold Interpolating Optimal-Transport Flow (MIOFlow) 2.0. This framework learns biologically informed cellular trajectories by integrating manifold learning, optimal transport, and neural differential equations. It models three core processes: (1) stochasticity and branching via Neural Stochastic Differential Equations; (2) non-conservative population changes using a learned growth-rate model initialized with unbalanced optimal transport; and (3) environmental influence through a joint latent space unifying gene expression with spatial features like local cell type composition and signaling. By operating in a PHATE-distance matching autoencoder latent space, MIOFlow 2.0 ensures trajectories respect the data's intrinsic geometry. Empirical comparisons show expressive trajectory learning via neural differential equations outperforms existing generative models, including simulation-free flow matching. Validated on synthetic datasets, embryoid body differentiation, and spatially resolved axolotl brain regeneration, MIOFlow 2.0 improves trajectory accuracy and reveals hidden drivers of cellular transitions, like specific signaling niches. MIOFlow 2.0 thus bridges single-cell and spatial transcriptomics to uncover tissue-scale trajectories.

MIOFlow 2.0: A unified framework for inferring cellular stochastic dynamics from single cell and spatial transcriptomics data

Abstract

Understanding cellular trajectories via time-resolved single-cell transcriptomics is vital for studying development, regeneration, and disease. A key challenge is inferring continuous trajectories from discrete snapshots. Biological complexity stems from stochastic cell fate decisions, temporal proliferation changes, and spatial environmental influences. Current methods often use deterministic interpolations treating cells in isolation, failing to capture the probabilistic branching, population shifts, and niche-dependent signaling driving real biological processes. We introduce Manifold Interpolating Optimal-Transport Flow (MIOFlow) 2.0. This framework learns biologically informed cellular trajectories by integrating manifold learning, optimal transport, and neural differential equations. It models three core processes: (1) stochasticity and branching via Neural Stochastic Differential Equations; (2) non-conservative population changes using a learned growth-rate model initialized with unbalanced optimal transport; and (3) environmental influence through a joint latent space unifying gene expression with spatial features like local cell type composition and signaling. By operating in a PHATE-distance matching autoencoder latent space, MIOFlow 2.0 ensures trajectories respect the data's intrinsic geometry. Empirical comparisons show expressive trajectory learning via neural differential equations outperforms existing generative models, including simulation-free flow matching. Validated on synthetic datasets, embryoid body differentiation, and spatially resolved axolotl brain regeneration, MIOFlow 2.0 improves trajectory accuracy and reveals hidden drivers of cellular transitions, like specific signaling niches. MIOFlow 2.0 thus bridges single-cell and spatial transcriptomics to uncover tissue-scale trajectories.
Paper Structure (37 sections, 2 theorems, 28 equations, 5 figures, 1 table, 4 algorithms)

This paper contains 37 sections, 2 theorems, 28 equations, 5 figures, 1 table, 4 algorithms.

Key Result

Theorem 1

Consider a time-varying vector field $f(z,t)$ defining latent cellular trajectories $dZ_{u,t} = f(Z_{u,t},t)dt$ with instantaneous density $\rho_t$, and a dissimilarity metric $D(\mu,\nu)$ such that $D(\mu,\nu)=0$ iff $\mu=\nu$. Given these assumptions, there exists a sufficiently large regularizati Moreover, because the process $Z_{u,t}$ is defined on the embedded manifold space $\mathcal{Z}$ lea

Figures (5)

  • Figure 1: Overview of the MIOFlow model.A. We initialize with scRNA-seq data and spatial transcriptomics, then concatenate both feature sets into a jointly embedded latent space. Each data point in this latent space represents a cell embedding informed by its neighbors. B. The embedding serves as input to three networks: a proliferation network predicting the proliferation rate, and drift and diffusion networks comprising the SDE/ODE model. C. The resulting trajectories can be visualized over the latent space.
  • Figure 2: features Extraction for MIOFlow 2.0.A. Build the neighborhood graph using knn graph or Voronoi polygons B. Compute local cell type frequency from neighborhood. Colors indicate cell types. C. Compute ligand-receptors signalling strength from neighbors to target cell. D. Local Expression Niche: The mean PCA embedding vector of neighboring cells E. Concatenate cell features with their spatial neighbors information
  • Figure 3: Comparison of trajectory inference methods on synthetic SERGIO datasets. (Top) Trifurcation dataset with three terminal fates. (Bottom) S-shaped dataset with cyclic progression and bifurcation. Cells are colored by ground truth timepoint. Predicted mean branch trajectories for each method are shown as colored curves. MIOFlow 2.0 maintains close adherence to the data manifold, while baseline methods deviate into unpopulated regions or fail to capture fine-scale curvature.
  • Figure 4: We evaluate MIOFlow 2.0 across three simulated datasets designed to mimic key biological characteristics: branching, population decline (dying), and proliferation (growing). By comparing the base MIOFlow 2.0 framework with variants incorporating the growth-rate model and Neural SDEs, we visualize the resulting trajectories. The results demonstrate that these integrated biological priors more accurately capture the geometry of the data manifold and the underlying dynamics compared to the base interpolation.
  • Figure 5: Gene and spatial feature embeddings colored by (A) cell type annotation, (B) pseudotime point. (C) NCAN:SDC3 ligand-receptor signalling spatial feature. (D) Trajectories traced on top of gene embedding (left) and joint projection of gene and spatial embeddings (right) colored by decoded NCAN:SDC3 signalling feature. (E) Spatial organization of cells in Stereo-seq dataset colored by NCAN:SDC3 signalling feature. Note that not all cells shown are included in the trajectories, but all are used for computing spatial features. (F) Decoded NCAN:SDC3 signalling feature over time, averaged across trajectories with +/- one standard deviation. (G) Decoded gene trajectories for a selection of highly variable genes.

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 1
  • proof