Table of Contents
Fetching ...

Sparse and nonparametric estimation of equations governing dynamical systems with applications to biology

G. Pillonetto, A. Giaretta, A. Aravkin, M. Bisiacco, T. Elston

TL;DR

The paper addresses data-driven discovery of governing equations for dynamical systems in biology, where purely parametric models struggle with nonlinearities and unknown input channels. It introduces Kernel-based Sindy (KB-Sindy), a hybrid framework that decomposes dynamics as $f(x,z)= g(x)+h(z)$ with a sparse parametric part and a kernel-based nonparametric part, leveraging the representer theorem to express $h(z)$ as $\hat{h}(z)=\sum_i \hat{\xi}_i \mathcal{K}(z,z_i)$. The authors demonstrate KB-Sindy on diverse systems (Lorenz with inputs, stacked Lorenz systems, autoregulation with Hill-type nonlinearities, calcium signaling, and nonlinear FIRs), showing accurate recovery of both explicit terms and nonlinear transformations, even under noise and high-dimensional settings. This approach mitigates the curse of dimensionality and enables interpretable, data-driven discovery of governing equations in complex biological contexts by combining sparse parametric learning with flexible nonparametric regularization.

Abstract

Data-driven discovery of model equations is a powerful approach for understanding the behavior of dynamical systems in many scientific fields. In particular, the ability to learn mathematical models from data would benefit systems biology, where the complex nature of these systems often makes a bottom up approach to modeling unfeasible. In recent years, sparse estimation techniques have gained prominence in system identification, primarily using parametric paradigms to efficiently capture system dynamics with minimal model complexity. In particular, the Sindy algorithm has successfully used sparsity to estimate nonlinear systems by extracting from a library of functions only a few key terms needed to capture the dynamics of these systems. However, parametric models often fall short in accurately representing certain nonlinearities inherent in complex systems. To address this limitation, we introduce a novel framework that integrates sparse parametric estimation with nonparametric techniques. It captures nonlinearities that Sindy cannot describe without requiring a priori information about their functional form. That is, without expanding the library of functions to include the one that is trying to be discovered. We illustrate our approach on several examples related to estimation of complex biological phenomena.

Sparse and nonparametric estimation of equations governing dynamical systems with applications to biology

TL;DR

The paper addresses data-driven discovery of governing equations for dynamical systems in biology, where purely parametric models struggle with nonlinearities and unknown input channels. It introduces Kernel-based Sindy (KB-Sindy), a hybrid framework that decomposes dynamics as with a sparse parametric part and a kernel-based nonparametric part, leveraging the representer theorem to express as . The authors demonstrate KB-Sindy on diverse systems (Lorenz with inputs, stacked Lorenz systems, autoregulation with Hill-type nonlinearities, calcium signaling, and nonlinear FIRs), showing accurate recovery of both explicit terms and nonlinear transformations, even under noise and high-dimensional settings. This approach mitigates the curse of dimensionality and enables interpretable, data-driven discovery of governing equations in complex biological contexts by combining sparse parametric learning with flexible nonparametric regularization.

Abstract

Data-driven discovery of model equations is a powerful approach for understanding the behavior of dynamical systems in many scientific fields. In particular, the ability to learn mathematical models from data would benefit systems biology, where the complex nature of these systems often makes a bottom up approach to modeling unfeasible. In recent years, sparse estimation techniques have gained prominence in system identification, primarily using parametric paradigms to efficiently capture system dynamics with minimal model complexity. In particular, the Sindy algorithm has successfully used sparsity to estimate nonlinear systems by extracting from a library of functions only a few key terms needed to capture the dynamics of these systems. However, parametric models often fall short in accurately representing certain nonlinearities inherent in complex systems. To address this limitation, we introduce a novel framework that integrates sparse parametric estimation with nonparametric techniques. It captures nonlinearities that Sindy cannot describe without requiring a priori information about their functional form. That is, without expanding the library of functions to include the one that is trying to be discovered. We illustrate our approach on several examples related to estimation of complex biological phenomena.

Paper Structure

This paper contains 12 sections, 44 equations, 14 figures.

Figures (14)

  • Figure 1: KB-Sindy overview. An experiment provides noisy time series data for a dynamical system. KB-Sindy models the system as the sum of two parts. The first is parametric and described in terms of a prescribed set of basis functions. The complexity of the parametric component is controlled by sparsity information. The function $h$ takes into account terms difficult to approximate using the basis functions and is therefore estimated in a nonparametric way. The complexity of $h$ is controlled by a machine learning approach based on a prescribed kernel, which only encodes information about the regularity of $h$.
  • Figure 2: Lorenz system subject to time-dependent input (left panels) or output feedback (right). The top panels show the system's time evolution in phase space. KB-Sindy performs joint reconstruction of system parameters and of the nonlinear transformation of the input signal or feedback control. Results are shown for the second equation in \ref{['SecondEqLorenz']} with $h$ depending either on the external forcing input $u(t)$ or on the system output $o(t)$ given by the sum of the three states $x_i(t)$. The parametric part of the model contains monomials up to order 4 requiring estimation of 4 coefficients. Only three of them are non-zero and equal to 28,-1,-1. The estimated coefficients are in the middle panels (black while the red circles correspond to the true values). The monomial coefficients related to $x_1,x_2,x_1x_2$, associated with the coefficients of numbers $1,2,6$, are non-zero and their estimates are close to the true parameter values. The KB-Sindy accurately captures the nonlinear transformation $h$ for the input and feedback (bottom panels).
  • Figure 3: Schematic diagram for negative autoregulation. mRNA ($x_1$) is translated into protein ($x_2$) at a rate $\gamma$. The protein acts as a transcriptional repressor by binding to the gene's promoter. Both mRNA and protein are degraded at rates $\delta_1$ and $\delta_2$, respectively.
  • Figure 4: Results for negative autoregulation model. The top panels show the simulations of the system using the following parameter values DelVecchio2014: $s_1 = 0.5 \, [nM/s]$, $s_2 = 0.5 \cdot 10^{-4} \, [s^{-1}]$, $\delta_1 = 5.78 \cdot 10^{-3} \, [s^{-1}]$, $\delta_2 = 1.16 [s^{-1}] \cdot 10^{-3}$ under weak ($S = 10^2,q = 2$) and high ($S = 10^3,q = 4$) nonlinearity. These panels also display the noisy measurements used by KB-Sindy to the parametric model component and the nonlinear Hill function. The estimated coefficients for the parametric component are given in the middle panels (black circles). The red circles correspond to the true values. The estimates for the Hill function $h$ are given in the bottom panels.
  • Figure 5: Results for autoregulation with positive and negative feedback. The top panels show the simulations using the same values as the negative autoregulation example DelVecchio2014 except with $h(x_2)= (\alpha x_2^4 + \beta x_2^2 + \gamma)/(\sigma x_2^4 + \xi x_2^2 + \lambda)$ where $\alpha=7.36e-11,\beta=1.7e-5,\gamma=0.0011,\sigma=8.9e-9,\xi=8.6e-6,\lambda=0.0022$. Also shown are the noisy measurements used by KB-Sindy. The estimated coefficients for the parametric component of the model are given in the middle panels (black circles). The red circles correspond to the true values. The function $h$ and its nonparametric estimate are shown in the bottom panel.
  • ...and 9 more figures