Table of Contents
Fetching ...

Foundation Inference Models for Markov Jump Processes

David Berghaus, Kostadin Cvejoski, Patrick Seifner, Cesar Ojeda, Ramses J. Sanchez

TL;DR

This work introduces Foundation Inference Models (FIM) for zero-shot inference of Markov Jump Processes on bounded state spaces from noisy observations. By pairing a Gillespie-based synthetic data generator with an attention-enabled recognition network, FIM jointly infers the initial distribution $\boldsymbol{\pi}_0$ and rate matrix $\mathbf{F}$ in a way that transfers across state-space dimensionalities. Across multiple domains, including discrete flashing ratchets, ion-channel data, and molecular dynamics-derived benchmarks, FIM achieves competitive zero-shot performance relative to finetuned baselines and enables zero-shot generation and thermodynamic analyses such as entropy production. A key strength is the ability to recover long-time properties (stationary distributions, MFPTs) and time-dependent moments directly from the inferred parameters, highlighting the practical impact for studying metastable systems with limited labeled data. Limitations arise from the synthetic prior; extending to richer rate distributions and birth–death processes will broaden applicability to more complex MJPs.

Abstract

Markov jump processes are continuous-time stochastic processes which describe dynamical systems evolving in discrete state spaces. These processes find wide application in the natural sciences and machine learning, but their inference is known to be far from trivial. In this work we introduce a methodology for zero-shot inference of Markov jump processes (MJPs), on bounded state spaces, from noisy and sparse observations, which consists of two components. First, a broad probability distribution over families of MJPs, as well as over possible observation times and noise mechanisms, with which we simulate a synthetic dataset of hidden MJPs and their noisy observation process. Second, a neural network model that processes subsets of the simulated observations, and that is trained to output the initial condition and rate matrix of the target MJP in a supervised way. We empirically demonstrate that one and the same (pretrained) model can infer, in a zero-shot fashion, hidden MJPs evolving in state spaces of different dimensionalities. Specifically, we infer MJPs which describe (i) discrete flashing ratchet systems, which are a type of Brownian motors, and the conformational dynamics in (ii) molecular simulations, (iii) experimental ion channel data and (iv) simple protein folding models. What is more, we show that our model performs on par with state-of-the-art models which are finetuned to the target datasets.

Foundation Inference Models for Markov Jump Processes

TL;DR

This work introduces Foundation Inference Models (FIM) for zero-shot inference of Markov Jump Processes on bounded state spaces from noisy observations. By pairing a Gillespie-based synthetic data generator with an attention-enabled recognition network, FIM jointly infers the initial distribution and rate matrix in a way that transfers across state-space dimensionalities. Across multiple domains, including discrete flashing ratchets, ion-channel data, and molecular dynamics-derived benchmarks, FIM achieves competitive zero-shot performance relative to finetuned baselines and enables zero-shot generation and thermodynamic analyses such as entropy production. A key strength is the ability to recover long-time properties (stationary distributions, MFPTs) and time-dependent moments directly from the inferred parameters, highlighting the practical impact for studying metastable systems with limited labeled data. Limitations arise from the synthetic prior; extending to richer rate distributions and birth–death processes will broaden applicability to more complex MJPs.

Abstract

Markov jump processes are continuous-time stochastic processes which describe dynamical systems evolving in discrete state spaces. These processes find wide application in the natural sciences and machine learning, but their inference is known to be far from trivial. In this work we introduce a methodology for zero-shot inference of Markov jump processes (MJPs), on bounded state spaces, from noisy and sparse observations, which consists of two components. First, a broad probability distribution over families of MJPs, as well as over possible observation times and noise mechanisms, with which we simulate a synthetic dataset of hidden MJPs and their noisy observation process. Second, a neural network model that processes subsets of the simulated observations, and that is trained to output the initial condition and rate matrix of the target MJP in a supervised way. We empirically demonstrate that one and the same (pretrained) model can infer, in a zero-shot fashion, hidden MJPs evolving in state spaces of different dimensionalities. Specifically, we infer MJPs which describe (i) discrete flashing ratchet systems, which are a type of Brownian motors, and the conformational dynamics in (ii) molecular simulations, (iii) experimental ion channel data and (iv) simple protein folding models. What is more, we show that our model performs on par with state-of-the-art models which are finetuned to the target datasets.
Paper Structure (41 sections, 23 equations, 14 figures, 17 tables, 1 algorithm)

This paper contains 41 sections, 23 equations, 14 figures, 17 tables, 1 algorithm.

Figures (14)

  • Figure 1: Processes of very different nature (seem to) feature similar jump processes. Left: State values (blue circles) recorded from the discrete flashing ratchet process (black line). Right: Current signal (blue line) recorded from the viral potassium channel $\text{Kcv}_{\text{MT35}}$, together with one possible coarse-grained representation (black line).
  • Figure 2: Foundation Inference Model (FIM) for MJP. Left: Graphical model of the FIM (synthetic) data generation mechanism. Filled (empty) circles represent observed (unobserved) random variables. The light-blue rectangle represents the continuous-time MJP trajectory, which is observed discretely in time. See main text for details regarding notation. Right: Inference model. The network $\psi_1$ is called $K$ times to process $K$ different time series. Their outputs is first processed by the attention network $\Omega_1$ and then by the FNNs $\phi_1$, $\phi_2$ and $\phi_3$ to obtain the estimates $\mathbf{\hat{F}}$, $\log \text{Var} \, \mathbf{\hat{F}}$ and $\boldsymbol{\hat{\pi}}_0$, respectively.
  • Figure 3: Illustration of the six-state discrete flashing ratchet model. The potential $V$ is switched on and off at rate $r$. The transition rates $f_{ij}^{\text{\tiny on}}, f_{ij}^{\text{\tiny off}}$ allow the particle to propagate through the ring.
  • Figure 4: Inference of the discrete flashing ratchet process. The FIM results correspond to FIM evaluations with context number $c(300, 50)$, averaged over 15 batches.
  • Figure 5: Zero-shot inference of DFR process. Left: master eq. solution $p_{\text{\tiny MJP}}(x, t)$ as time evolves, wrt. the (averaged) FIM-inferred rate matrix is shown in black. The ground-truth solution is shown in blue. Right: Total entropy production computed from FIM (over a time-horizon $T=2.5 \, [a.u.]$). The model works remarkably well for a continuous range of potential values.
  • ...and 9 more figures