Automating the Discovery of Partial Differential Equations in Dynamical Systems

Weizhen Li; Rui Carvalho

Automating the Discovery of Partial Differential Equations in Dynamical Systems

Weizhen Li, Rui Carvalho

TL;DR

ARGOS-RAL addresses the problem of discovering PDEs from data by extending ARGOS with a recurrent adaptive lasso to identify governing equations directly from spatiotemporal data. It automates numerical differentiation via Savitzky-Golay filtering and Gaussian blur, builds a rich candidate library, and solves a single sparse regression across all time points with iterative reweighting and AIC-based model selection. Across diverse canonical PDEs, ARGOS-RAL demonstrates robustness to noise and nonuniform sampling and often outperforms STRidge, though some equations require more data and the method remains library-dependent with limited uncertainty quantification. The approach promises automated, scalable PDE discovery across physics, biology, and engineering by integrating statistical regression, machine learning, and dynamical-systems theory.

Abstract

Identifying partial differential equations (PDEs) from data is crucial for understanding the governing mechanisms of natural phenomena, yet it remains a challenging task. We present an extension to the ARGOS framework, ARGOS-RAL, which leverages sparse regression with the recurrent adaptive lasso to identify PDEs from limited prior knowledge automatically. Our method automates calculating partial derivatives, constructing a candidate library, and estimating a sparse model. We rigorously evaluate the performance of ARGOS-RAL in identifying canonical PDEs under various noise levels and sample sizes, demonstrating its robustness in handling noisy and non-uniformly distributed data. We also test the algorithm's performance on datasets consisting solely of random noise to simulate scenarios with severely compromised data quality. Our results show that ARGOS-RAL effectively and reliably identifies the underlying PDEs from data, outperforming the sequential threshold ridge regression method in most cases. We highlight the potential of combining statistical methods, machine learning, and dynamical systems theory to automatically discover governing equations from collected data, streamlining the scientific modeling process.

Automating the Discovery of Partial Differential Equations in Dynamical Systems

TL;DR

Abstract

Paper Structure (23 sections, 24 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 23 sections, 24 equations, 6 figures, 1 table, 2 algorithms.

Introduction
Methods
Overview of the ARGOS-RAL Framework
Automated Numerical Differentiation using the Savitzky-Golay Filter and the Gaussian Blur
Sparse Regression with the Recurrent Adaptive Lasso
Results and Discussion
Evaluating the Performance of ARGOS-RAL under Varying Noise Levels and Sample Sizes
Quantifying Success Rates in Identifying Canonical PDEs
Robustness Analysis using White Gaussian Noise
Conclusions
Supplementary materials
Gaussian Blur Kernels
Algorithms
Additional PDE Test Cases
Burgers' equation
...and 8 more sections

Figures (6)

Figure 1: Process of identifying PDEs from data using ARGOS with the recurrent adaptive lasso. The identification process consists of three main steps: (A) automatic smoothing and calculation of derivatives, (B) construction of the candidate library, and (C) implementation of the recurrent adaptive lasso. We begin by collecting the data $\tilde{\mathbf{U}}$ and applying the automatic Savitzky-Golay filter with Gaussian blur to calculate the smoothed $\mathbf{U}$ and its partial derivatives. Next, we vectorize the smoothed data, all partial derivatives, and other related terms to construct the candidate library. Finally, we employ the recurrent adaptive lasso to identify the active features in the library, and we estimate the unbiased coefficients of the identified model using ordinary least squares regression.
Figure 2: Pareto curve of the adaptive lasso for a sampled dataset from a Navier-Stokes system with an SNR of 36 dB. The Pareto curve balances the trade-off between sparsity and goodness-of-fit. The red point on the curve indicates the optimal value of the regularization parameter $\lambda$ that achieves the best balance between these two competing objectives. Increasing $\lambda$ leads to sparser solutions at the cost of a poorer fit to the data, while decreasing $\lambda$ improves the fit but yields less sparse solutions.
Figure 3: Influence of SNR on the Burgers' equation dataset. (A) Noiseless data points (blue) serve as a reference for evaluating the impact of sample size on PDE identification accuracy. (B-F) Noisy datasets are generated by adding Gaussian noise at SNR levels of 40 dB, 30 dB, 20 dB, 10 dB and 0 dB, respectively, to comprehensively characterize the system's behavior under varying noise conditions.
Figure 4: Success rates of ARGOS-RAL and STRidge in identifying (A) Burgers', (B) cable, (C) Navier-Stokes, (D) reaction-diffusion, and (E) quantum harmonic oscillator equations with varying SNRs and sample sizes. We analyze the noise tolerance by adding noise of different SNRs to the PDE solutions. For the sample size analysis, we randomly sample points from the set $\{\mathbf{u}_t,\mathbf{\Theta}(\mathbf{u})\}$ based on noiseless data. In panel (C), we use the region indicated by the red rectangle to implement both the SNR and sample size tests by sampling points within this area. PDE solution plots display time snapshots at $t=306$ for Navier-Stokes in panel (C) and $t=1$ for reaction-diffusion in panel (D). Lines connecting the points are used for visual guidance only and do not represent a fit to the data. Shaded regions represent model discovery accuracy above 80%.
Figure 5: Number of nonzero terms identified from 100 random noise datasets using different candidate function libraries. For each case, we count the number of nonzero coefficients in the sparse regression. We display the distribution of these counts using dots for each of the 100 trials and summarize the results using box plots. Each box plot shows the median (solid horizontal line), interquartile range (box), and minimum and maximum values (whiskers) for the 100 trials. The optimal algorithm should produce boxes located either at zero, indicating a null model, or above four, representing a dense model. The box may span a wide range from four to the maximum number of terms in the library.
...and 1 more figures

Automating the Discovery of Partial Differential Equations in Dynamical Systems

TL;DR

Abstract

Automating the Discovery of Partial Differential Equations in Dynamical Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (6)