Sparse and nonparametric estimation of equations governing dynamical systems with applications to biology
G. Pillonetto, A. Giaretta, A. Aravkin, M. Bisiacco, T. Elston
TL;DR
The paper addresses data-driven discovery of governing equations for dynamical systems in biology, where purely parametric models struggle with nonlinearities and unknown input channels. It introduces Kernel-based Sindy (KB-Sindy), a hybrid framework that decomposes dynamics as $f(x,z)= g(x)+h(z)$ with a sparse parametric part and a kernel-based nonparametric part, leveraging the representer theorem to express $h(z)$ as $\hat{h}(z)=\sum_i \hat{\xi}_i \mathcal{K}(z,z_i)$. The authors demonstrate KB-Sindy on diverse systems (Lorenz with inputs, stacked Lorenz systems, autoregulation with Hill-type nonlinearities, calcium signaling, and nonlinear FIRs), showing accurate recovery of both explicit terms and nonlinear transformations, even under noise and high-dimensional settings. This approach mitigates the curse of dimensionality and enables interpretable, data-driven discovery of governing equations in complex biological contexts by combining sparse parametric learning with flexible nonparametric regularization.
Abstract
Data-driven discovery of model equations is a powerful approach for understanding the behavior of dynamical systems in many scientific fields. In particular, the ability to learn mathematical models from data would benefit systems biology, where the complex nature of these systems often makes a bottom up approach to modeling unfeasible. In recent years, sparse estimation techniques have gained prominence in system identification, primarily using parametric paradigms to efficiently capture system dynamics with minimal model complexity. In particular, the Sindy algorithm has successfully used sparsity to estimate nonlinear systems by extracting from a library of functions only a few key terms needed to capture the dynamics of these systems. However, parametric models often fall short in accurately representing certain nonlinearities inherent in complex systems. To address this limitation, we introduce a novel framework that integrates sparse parametric estimation with nonparametric techniques. It captures nonlinearities that Sindy cannot describe without requiring a priori information about their functional form. That is, without expanding the library of functions to include the one that is trying to be discovered. We illustrate our approach on several examples related to estimation of complex biological phenomena.
