Table of Contents
Fetching ...

Symbolic Discovery of Stochastic Differential Equations with Genetic Programming

Sigur de Vries, Sander W. Keemink, Marcel A. J. van Gerven

TL;DR

This work introduces a method for symbolic discovery of stochastic differential equations based on genetic programming, jointly optimizing drift and diffusion functions via the maximum likelihood estimate, contributing to the automation of science in a noisy and dynamic world.

Abstract

Automated scientific discovery aims to improve scientific understanding through machine learning. A central approach in this field is symbolic regression, which uses genetic programming or sparse regression to learn interpretable mathematical expressions to explain observed data. Conventionally, the focus of symbolic regression is on identifying ordinary differential equations. The general view is that noise only complicates the recovery of deterministic dynamics. However, explicitly learning a symbolic function of the noise component in stochastic differential equations enhances modelling capacity, increases knowledge gain and enables generative sampling. We introduce a method for symbolic discovery of stochastic differential equations based on genetic programming, jointly optimizing drift and diffusion functions via the maximum likelihood estimate. Our results demonstrate accurate recovery of governing equations, efficient scaling to higher-dimensional systems, robustness to sparsely sampled problems and generalization to stochastic partial differential equations. This work extends symbolic regression toward interpretable discovery of stochastic dynamical systems, contributing to the automation of science in a noisy and dynamic world.

Symbolic Discovery of Stochastic Differential Equations with Genetic Programming

TL;DR

This work introduces a method for symbolic discovery of stochastic differential equations based on genetic programming, jointly optimizing drift and diffusion functions via the maximum likelihood estimate, contributing to the automation of science in a noisy and dynamic world.

Abstract

Automated scientific discovery aims to improve scientific understanding through machine learning. A central approach in this field is symbolic regression, which uses genetic programming or sparse regression to learn interpretable mathematical expressions to explain observed data. Conventionally, the focus of symbolic regression is on identifying ordinary differential equations. The general view is that noise only complicates the recovery of deterministic dynamics. However, explicitly learning a symbolic function of the noise component in stochastic differential equations enhances modelling capacity, increases knowledge gain and enables generative sampling. We introduce a method for symbolic discovery of stochastic differential equations based on genetic programming, jointly optimizing drift and diffusion functions via the maximum likelihood estimate. Our results demonstrate accurate recovery of governing equations, efficient scaling to higher-dimensional systems, robustness to sparsely sampled problems and generalization to stochastic partial differential equations. This work extends symbolic regression toward interpretable discovery of stochastic dynamical systems, contributing to the automation of science in a noisy and dynamic world.
Paper Structure (26 sections, 15 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 15 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of genetic programming of stochastic differential equations. (a) Stochastic time series are observed to which stochastic differential equations are fitted with symbolic expressions. The resulting stochastic differential equation offers interpretability and generative modelling. The symbolic equations for the drift $f(x)$ and diffusion $g(x)$ are optimized with genetic programming. (b) In genetic programming, trees are adapted with crossover that swaps subtrees, illustrated with the exchange of the red and blue subtrees. (c) Besides crossover, mutation also changes the structure of individual trees. Various aspects can be mutated, where this example shows the change of an operator in green. (d) Examples of systems on which the proposed method can be applied to. The curves in each subpanel illustrate the behaviour of the state following stochastic system dynamics.
  • Figure 2: GP-SDE accurately recovers the governing equations of stochastic dynamical systems. The mean squared error (MSE) between the true and learned functions for the drift and diffusion on test data are presented for ten different seeds. The average of the ten runs in indicated with a black cross for every method. Furthermore, for each run, a star indicates that the correct structure of the equations were found for all variables, while a circle indicates that the structure of at least one equation was incorrect. The methods include Kramers-Moyal expansion followed by sparse regression (KM-SR), genetic programming of ordinary differential equations (GP-ODE), which only learns a function for the drift, and genetic programming of stochastic differential equations (GP-SDE). (a, b) Results for the double well problem with an additive, linear and non-linear multiplicative diffusion term. (c, d) Results for the van der Pol oscillator. (e, f) Results for the Rössler attractor. (g, h) Results for the Lorenz96 model with 5, 10 and 20 variables. (i, j) Results for the Lotka-Volterra model with a sampling rate of 0.02, 0.2 and 0.5. Both GP-ODE and GP-SDE are extended with multi-step integration, labelled as GP-ODE-MS and GP-SDE-MS respectively.
  • Figure 3: Simulation of the discovered models for the Rössler attractor. (a) Given a fixed initial condition, three trajectories of the Rössler attractor are simulated with different Wiener processes (black). The best ordinary differential equation evolved by genetic programming (GP-ODE) is integrated from the same initial condition (orange). (b) The same three trajectories of the true system are shown, together with the mean (green) and standard deviation (shaded area) computed over 50 trajectories sampled from the best model found by method based on the Kramers-Moyal expansion and sparse regression (KM-SR), where the initial condition was fixed but different Wiener processes were used in each trajectory. (c) Same approach as in (b), but with the best stochastic differential equation discovered with genetic programming (GP-SDE). For each method, the corresponding equations are presented in Table \ref{['Table: Rossler_eq']}.
  • Figure 4: GP-SDE can recover stochastic partial differential equations. (a) Evolution of the Fisher-KPP equation with respect to $x$ and time, given the ground truth equation and identified system with GP-SDE, presented in Table \ref{['Table: SPDE_eq']}. (b) Snapshots of the evolution of the two-dimensional heat transfer with respect to $x$, $y$ and time. Again, the true and learned system with GP-SDE are presented.
  • Figure 5: Runtimes of the methods. The computational time of GP-ODE, KM-SR and GP-SDE on the Lorenz96 model with increasing dimensionality. The runtimes are averaged over the ten seeds for every experiment, after running one seed to compile the algorithms. KM-SR is evaluated with four and sixteen bins ($b$).
  • ...and 1 more figures