Table of Contents
Fetching ...

A Selective Review of Modern Stochastic Modeling: SDE/SPDE Numerics, Data-Driven Identification, and Generative Methods with Applications in Biomathematics

Yassine Sabbar, Kottakkaran Sooppy Nisar

TL;DR

This article surveys recent advances (2020–2025) in stochastic modeling with a focus on biology and epidemiology, organizing theory, inference, and computation into four strands: SDE/SPDE numerics, data-driven system identification, generative stochastic dynamics, and uncertainty-aware simulations. It foregrounds neural SDEs for learning drift and diffusion from data, operator-learning methods for SPDEs, and diffusion-based generative models that invert forward stochastic processes, all while emphasizing structure-preserving numerics, identifiability, and long-time accuracy. The authors synthesize methodological novelties with practical workflows for estimating time-varying rates, modeling spatial spread, and reporting uncertainty for decision making, and they articulate concrete open problems such as discretization-aware training, robust UQ under irregular data, and controlled reverse-time dynamics. By linking theoretical foundations to applications in biology and epidemiology, the review provides a pragmatic roadmap for credible, scalable, and policy-relevant stochastic modeling in complex systems.

Abstract

This review maps developments in stochastic modeling, highlighting non-standard approaches and their applications to biology and epidemiology. It brings together four strands: (1) core models for systems that evolve with randomness; (2) learning key parts of those models directly from data; (3) methods that can generate realistic synthetic data in continuous time; and (4) numerical techniques that keep simulations stable, accurate, and faithful over long runs. The objective is practical: help researchers quickly see what is new, how the pieces fit together, and where important gaps remain. We summarize tools for estimating changing infection or reaction rates under noisy and incomplete observations, modeling spatial spread, accounting for sudden jumps and heavy tails, and reporting uncertainty in a way that is useful for decisions. We also highlight open problems that deserve near-term attention: separating true dynamics from noise when data are irregular; learning spatial dynamics under random influences with guarantees of stability; aligning training with the numerical method used in applications; preserving positivity and conservation in all simulations; reducing cost while controlling error for large studies; estimating rare but important events; and adopting clear, comparable reporting standards. By organizing the field around these aims, the review offers a concise guide to current methods, their practical use, and the most promising directions for future work in biology and epidemiology.s.

A Selective Review of Modern Stochastic Modeling: SDE/SPDE Numerics, Data-Driven Identification, and Generative Methods with Applications in Biomathematics

TL;DR

This article surveys recent advances (2020–2025) in stochastic modeling with a focus on biology and epidemiology, organizing theory, inference, and computation into four strands: SDE/SPDE numerics, data-driven system identification, generative stochastic dynamics, and uncertainty-aware simulations. It foregrounds neural SDEs for learning drift and diffusion from data, operator-learning methods for SPDEs, and diffusion-based generative models that invert forward stochastic processes, all while emphasizing structure-preserving numerics, identifiability, and long-time accuracy. The authors synthesize methodological novelties with practical workflows for estimating time-varying rates, modeling spatial spread, and reporting uncertainty for decision making, and they articulate concrete open problems such as discretization-aware training, robust UQ under irregular data, and controlled reverse-time dynamics. By linking theoretical foundations to applications in biology and epidemiology, the review provides a pragmatic roadmap for credible, scalable, and policy-relevant stochastic modeling in complex systems.

Abstract

This review maps developments in stochastic modeling, highlighting non-standard approaches and their applications to biology and epidemiology. It brings together four strands: (1) core models for systems that evolve with randomness; (2) learning key parts of those models directly from data; (3) methods that can generate realistic synthetic data in continuous time; and (4) numerical techniques that keep simulations stable, accurate, and faithful over long runs. The objective is practical: help researchers quickly see what is new, how the pieces fit together, and where important gaps remain. We summarize tools for estimating changing infection or reaction rates under noisy and incomplete observations, modeling spatial spread, accounting for sudden jumps and heavy tails, and reporting uncertainty in a way that is useful for decisions. We also highlight open problems that deserve near-term attention: separating true dynamics from noise when data are irregular; learning spatial dynamics under random influences with guarantees of stability; aligning training with the numerical method used in applications; preserving positivity and conservation in all simulations; reducing cost while controlling error for large studies; estimating rare but important events; and adopting clear, comparable reporting standards. By organizing the field around these aims, the review offers a concise guide to current methods, their practical use, and the most promising directions for future work in biology and epidemiology.s.

Paper Structure

This paper contains 13 sections, 20 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Forward–reverse SDE framework and its deterministic probability-flow ODE counterpart. Panel (A) shows the forward SDE transporting $p_{\mathrm{data}}$ to a reference Gaussian $p_{\mathrm{ref}}$. Panel (B) shows the reverse SDE mapping $p_{\mathrm{ref}}$ back to $p_{\mathrm{data}}$ via the learned score function $\nabla \log p_t(x)$. Panel (C) depicts the equivalent probability-flow ODE sharing the same marginals as the forward and reverse SDEs but evolving deterministically. Vector fields represent drift components; colored trajectories indicate sample paths.
  • Figure 2: Strong and weak convergence on a log–log scale for representative schemes. The strong error exhibits the expected order one half for an Euler–Maruyama discretization with multiplicative noise, whereas the weak error attains order one for smooth test functionals. Dashed guides indicate the asymptotic slopes used to assess rates. The same methodology extends to semidiscrete SPDEs, where temporal and spatial errors interact through the smoothing properties of the linear part and the regularity of the stochastic forcing.
  • Figure 3: Convergence of time averages to the invariant expectation for an ergodic observable $\mathcal{Q}$. Curves show $\frac{1}{t}\int_0^t \mathcal{Q}(X_s)\,\mathrm{d}s$ for an explicit Euler--Maruyama discretization and an implicit method at the same step size. The dotted line marks $\mathbb{E}_\pi[\mathcal{Q}]$, and the vertical segment at the final time indicates the residual bias. The implicit scheme approaches the stationary value more rapidly and with smaller long–time bias, consistent with theoretical predictions based on generator perturbations and Poisson–equation error analysis.
  • Figure 4: Schematic representation of the MLMC pipeline augmented with rare–event variance–reduction strategies. On the left, a hierarchy of discretization levels $\ell=0,\dots,L$ (light blue) is constructed, with the coarsest level $\ell=0$ at the top and the finest resolution $\ell=L$ at the bottom. Each level generates either a direct expectation $\mathbb{E}[\varphi_0]$ or a level difference $\mathbb{E}[\varphi_\ell - \varphi_{\ell-1}]$, which collectively form a telescoping sum. This telescoping identity ensures that the computational effort is concentrated where variance is largest, allowing coarser levels to provide inexpensive variance correction to finer ones. The sum feeds into the MLMC estimator, which delivers an unbiased approximation to $\mathbb{E}[\varphi_L]$ at reduced cost compared to single–level Monte Carlo for the same accuracy. Finally, near the estimator stage, rare–event techniques (including importance sampling, splitting/subset simulation, and adaptive multilevel splitting )can be incorporated to target probabilities of rare events or tail–dependent statistics, further improving estimator efficiency and robustness.