Learning Explicit Single-Cell Dynamics Using ODE Representations
Jan-Philipp von Bassewitz, Adeel Pervez, Marco Fumero, Matthew Robinson, Theofanis Karaletsos, Francesco Locatello
TL;DR
Cell-MNN introduces a locally linear latent ODE within an end-to-end encoder–decoder framework to model single-cell differentiation from snapshot data. By learning a state/time-conditioned linear operator in PCA-embedded latent space, it achieves competitive interpolation performance without OT preprocessing and enables direct extraction of gene interactions interpretable in the original gene space. The approach supports scalable amortized training across multiple datasets and demonstrates robustness to noise, outperforming baselines on large inflations and offering biologically plausible interactions validated by TRRUST. This combination of predictive accuracy and mechanistic interpretability advances trajectory reconstruction and GRN discovery in single-cell genomics, with potential implications for hypothesis generation and perturbation design.
Abstract
Modeling the dynamics of cellular differentiation is fundamental to advancing the understanding and treatment of diseases associated with this process, such as cancer. With the rapid growth of single-cell datasets, this has also become a particularly promising and active domain for machine learning. Current state-of-the-art models, however, rely on computationally expensive optimal transport preprocessing and multi-stage training, while also not discovering explicit gene interactions. To address these challenges we propose Cell-Mechanistic Neural Networks (Cell-MNN), an encoder-decoder architecture whose latent representation is a locally linearized ODE governing the dynamics of cellular evolution from stem to tissue cells. Cell-MNN is fully end-to-end (besides a standard PCA pre-processing) and its ODE representation explicitly learns biologically consistent and interpretable gene interactions. Empirically, we show that Cell-MNN achieves competitive performance on single-cell benchmarks, surpasses state-of-the-art baselines in scaling to larger datasets and joint training across multiple datasets, while also learning interpretable gene interactions that we validate against the TRRUST database of gene interactions.
