Table of Contents
Fetching ...

A Bayesian Approach for Discovering Time- Delayed Differential Equation from Data

Debangshu Chowdhury, Souvik Chakraborty

TL;DR

BayTiDe tackles the problem of discovering time-delayed differential equations from noisy data by casting the task as sparse Bayesian regression with a Discontinuous Spike-and-Slab prior, while simultaneously inferring an unknown time delay. It augments data with delay terms, uses Gibbs sampling to jointly infer the active terms, delay index, and hyperparameters, and reports posterior inclusion probabilities to select the governing functions. The approach delivers accurate delay estimation, uncertainty quantification, and robustness to noise, and demonstrates superior performance to SINDy across several benchmarks including Mackey-Glass and JC Sprott, even when the true delay is unknown. The work shows strong generalization to unseen poles and large delays, offering a practical, uncertainty-aware tool for data-driven discovery of time-delay dynamics in engineering and natural systems.

Abstract

Time-delayed differential equations (TDDEs) are widely used to model complex dynamic systems where future states depend on past states with a delay. However, inferring the underlying TDDEs from observed data remains a challenging problem due to the inherent nonlinearity, uncertainty, and noise in real-world systems. Conventional equation discovery methods often exhibit limitations when dealing with large time delays, relying on deterministic techniques or optimization-based approaches that may struggle with scalability and robustness. In this paper, we present BayTiDe - Bayesian Approach for Discovering Time-Delayed Differential Equations from Data, that is capable of identifying arbitrarily large values of time delay to an accuracy that is directly proportional to the resolution of the data input to it. BayTiDe leverages Bayesian inference combined with a sparsity-promoting discontinuous spike-and-slab prior to accurately identify time-delayed differential equations. The approach accommodates arbitrarily large time delays with accuracy proportional to the input data resolution, while efficiently narrowing the search space to achieve significant computational savings. We demonstrate the efficiency and robustness of BayTiDe through a range of numerical examples, validating its ability to recover delayed differential equations from noisy data.

A Bayesian Approach for Discovering Time- Delayed Differential Equation from Data

TL;DR

BayTiDe tackles the problem of discovering time-delayed differential equations from noisy data by casting the task as sparse Bayesian regression with a Discontinuous Spike-and-Slab prior, while simultaneously inferring an unknown time delay. It augments data with delay terms, uses Gibbs sampling to jointly infer the active terms, delay index, and hyperparameters, and reports posterior inclusion probabilities to select the governing functions. The approach delivers accurate delay estimation, uncertainty quantification, and robustness to noise, and demonstrates superior performance to SINDy across several benchmarks including Mackey-Glass and JC Sprott, even when the true delay is unknown. The work shows strong generalization to unseen poles and large delays, offering a practical, uncertainty-aware tool for data-driven discovery of time-delay dynamics in engineering and natural systems.

Abstract

Time-delayed differential equations (TDDEs) are widely used to model complex dynamic systems where future states depend on past states with a delay. However, inferring the underlying TDDEs from observed data remains a challenging problem due to the inherent nonlinearity, uncertainty, and noise in real-world systems. Conventional equation discovery methods often exhibit limitations when dealing with large time delays, relying on deterministic techniques or optimization-based approaches that may struggle with scalability and robustness. In this paper, we present BayTiDe - Bayesian Approach for Discovering Time-Delayed Differential Equations from Data, that is capable of identifying arbitrarily large values of time delay to an accuracy that is directly proportional to the resolution of the data input to it. BayTiDe leverages Bayesian inference combined with a sparsity-promoting discontinuous spike-and-slab prior to accurately identify time-delayed differential equations. The approach accommodates arbitrarily large time delays with accuracy proportional to the input data resolution, while efficiently narrowing the search space to achieve significant computational savings. We demonstrate the efficiency and robustness of BayTiDe through a range of numerical examples, validating its ability to recover delayed differential equations from noisy data.
Paper Structure (13 sections, 22 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 22 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: Schematic illustration: BayTiDe is the combination of two key ideas and sparse Bayesian regression. The first idea is the augmenting of the state data with artificial state variables that correspond to the delay terms within the governing equations (a). This effectively doubles the number of variables to be identifies but allows for the inclusion of functions of two or more delay terms. The delay index has to be sampled from an arbitrarily large search space of indices ($N_{start},N_{end}$), each with a corresponding unique augmented data matrix (b). The second key idea is to model this sampling using a Multinomial distribution, assigning each index with a finite probability of being chosen. The framework samples the delay index and the corresponding augmented data matrix (c). The matrix is then transformed into the library $\mathbf{L}$ using the candidate functions $\{f_1, f_2, ..., f_K\}$(d). This library is utilized for performing sparse Bayesian linear regression. The candidate functions in the library are parameterized by a weight vector $\bm \theta$ whose sparsity (shown by the dots in the figure) is promoted through the use of sparsity promoting priors (e). Each element of the weight vector is assigned a latent variable $Z_k: k=1,2,...K$ to classify the weight as a spike or a slab. The framework is run independently for each state derivative $y_i$ to find the entire system of DDEs (f). After the Bayesian regression is completed, only the functions whose corresponding PIP ($P(Z_K=1|Y)$ is greater than 0.5 are considered in the final predicted model (g).
  • Figure 2: Hierarchical Bayesian Network: Green boxes represent constant hyper-parameters input to the system. White boxes are the random variables that are sampled in each iteration of sampling. $\mathbf{Y}$ represents the first derivative of the state variable and $\mathbf{L_\tau}$ represents a library of candidate functions constructed using the measured data.
  • Figure 3: PIP of the Exponential System corrupted with 15% Gaussian white noise: The feature library $\mathbf{L_\tau} \in \mathbb{R}^{n\times17}$ consists of the functions listed in \ref{['eq:exponential function list']}, applied combinatorially to the augmented data matrix. \ref{['fig:exponential pip larger lib']} shows the PIP of each candidate function when the correlated functions are included. BayTiDe completely fails in identifying the ground truth. Another point to be noted is that the identified functions are expansions of the functions part of the real equation which represents a damped periodic system. \ref{['fig:exponential pip smaller lib']} shows the PIP of each candidate function when the correlated functions are removed. The feature library now has the shape $\mathbf{L}^{n\times13}$. BayTiDe discovers the governing equation with a sure probability (PIP=1). The two figures highlight the dependency on the selection of candidate functions.
  • Figure 4: Uncertainty Plot and response comparison for the Exponential System: Green: The real equation simulated with $\tau=1$. Dashed Line: The discovered equation simulated with discovered $\tau=0.99$ and the identified equation. \ref{['fig:exponential result diff initial']} The shaded area is the 95% CI of the predicted system, and represents the uncertainty associated with the predicted value at each time step. As such, it is observed to increase with time. The red line is the mean value of the predicted response after simulations using every sampled weight. \ref{['fig:exponential result training']} Performance of the proposed approach against ground truth for a different initial condition that results in completely different system dynamics.
  • Figure 5: Performance of the proposed approach for JC Sprott System corrupted with 15% Gaussian white noise: The feature library $\mathbf{L_\tau}\in\mathbb{R}^{n\times15}$ consists of the functions listed in \ref{['eq:jc sprott functions']}. \ref{['fig:jc sprott pip']} shows the PIP of the candidate functions. The function $\sin(x_\tau)$ is identified correctly with a PIP=1. $e^{x_\tau}$ has a PIP=0.48; however, it does not appear in the identified equation as it is below the threshold. \ref{['fig:jc sprott tau 3 training']} compares the response obtained using the identified equation with ground truth. The 95% CI (the shaded region) is overlapped by the plots, indicating high confidence in the identified model. \ref{['fig:jc sprott tau 3 diff initial']} compares the equation to the actual system for a different initial condition.
  • ...and 3 more figures