Table of Contents
Fetching ...

Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching

Nan Jiang, Md Nasim, Yexiang Xue

TL;DR

The paper addresses the challenge of discovering symbolic ODEs from trajectory data when fixed training sets lead to overfitting in chaotic systems. It introduces APPS, an active framework that sketches phase portraits to identify informative regions in phase space and then samples batches of near-neighbor initial conditions, with a Transformer-based decoder generating candidate ODEs and a REINFORCE-trained data-query loop. By evaluating candidates on region-specific phase portraits and using NMSE-based rewards, APPS consistently outperforms passive baselines on Strogatz and ODEBase datasets under noiseless, noisy, and irregular-time settings. The approach reduces data requirements while improving accuracy and ranking of predicted ODEs, offering a scalable path for active discovery of dynamical laws in complex systems.

Abstract

The symbolic discovery of Ordinary Differential Equations (ODEs) from trajectory data plays a pivotal role in AI-driven scientific discovery. Existing symbolic methods predominantly rely on fixed, pre-collected training datasets, which often result in suboptimal performance, as demonstrated in our case study in Figure 1. Drawing inspiration from active learning, we investigate strategies to query informative trajectory data that can enhance the evaluation of predicted ODEs. However, the butterfly effect in dynamical systems reveals that small variations in initial conditions can lead to drastically different trajectories, necessitating the storage of vast quantities of trajectory data using conventional active learning. To address this, we introduce Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching (APPS). Instead of directly selecting individual initial conditions, our APPS first identifies an informative region within the phase space and then samples a batch of initial conditions from this region. Compared to traditional active learning methods, APPS mitigates the gap of maintaining a large amount of data. Extensive experiments demonstrate that APPS consistently discovers more accurate ODE expressions than baseline methods using passively collected datasets.

Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching

TL;DR

The paper addresses the challenge of discovering symbolic ODEs from trajectory data when fixed training sets lead to overfitting in chaotic systems. It introduces APPS, an active framework that sketches phase portraits to identify informative regions in phase space and then samples batches of near-neighbor initial conditions, with a Transformer-based decoder generating candidate ODEs and a REINFORCE-trained data-query loop. By evaluating candidates on region-specific phase portraits and using NMSE-based rewards, APPS consistently outperforms passive baselines on Strogatz and ODEBase datasets under noiseless, noisy, and irregular-time settings. The approach reduces data requirements while improving accuracy and ranking of predicted ODEs, offering a scalable path for active discovery of dynamical laws in complex systems.

Abstract

The symbolic discovery of Ordinary Differential Equations (ODEs) from trajectory data plays a pivotal role in AI-driven scientific discovery. Existing symbolic methods predominantly rely on fixed, pre-collected training datasets, which often result in suboptimal performance, as demonstrated in our case study in Figure 1. Drawing inspiration from active learning, we investigate strategies to query informative trajectory data that can enhance the evaluation of predicted ODEs. However, the butterfly effect in dynamical systems reveals that small variations in initial conditions can lead to drastically different trajectories, necessitating the storage of vast quantities of trajectory data using conventional active learning. To address this, we introduce Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching (APPS). Instead of directly selecting individual initial conditions, our APPS first identifies an informative region within the phase space and then samples a batch of initial conditions from this region. Compared to traditional active learning methods, APPS mitigates the gap of maintaining a large amount of data. Extensive experiments demonstrate that APPS consistently discovers more accurate ODE expressions than baseline methods using passively collected datasets.
Paper Structure (25 sections, 1 theorem, 15 equations, 7 figures, 9 tables)

This paper contains 25 sections, 1 theorem, 15 equations, 7 figures, 9 tables.

Key Result

Theorem 1

Consider the initial value problem $\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x})$, $\mathbf{x}(0) = \mathbf{x}_0$. Suppose that $\mathbf{f}$ is continuous and that all its partial derivatives ${\partial {f}_i}/{\partial x_i}$, $i, j = 1, \ldots, n$, are continuous for $\mathbf{x}$ in some open connecte

Figures (7)

  • Figure 1: The performance of predicted ODE from passively-learned baseline is heavily influenced by the collected training data while our Apps method is not. The dots represent noisy ground-truth trajectory data, and the lines show predicted values of state variables under identical initial conditions. (a, b) Our Apps and the baseline predict accurately for the trajectory starting at $\mathbf{x}_0=(0,1)$. (c, d) For the trajectory starting at $\mathbf{x}_0=(4,-1)$, the baseline performs poorly while Apps maintains accuracy.
  • Figure 2: The pipeline of Apps for symbolic discovery of ODEs consists of 3 steps: (a) ODEs are sampled from the sequential decoder by iteratively sampling grammar rules. The predicted rule at each step serves as input for the decoder in the subsequent step. (b) The sampled sequence of grammar rules is converted into a valid ODE with $n=2$ variables. Each rule expands the first non-terminal symbol, with the expanded parts highlighted in blue colors for clarity. (c) The phase portrait for the predicted ODEs (e.g., $\phi_1, \phi_2, \phi_3$) is sketched, and regions with high informativeness, such as $u_2$, are identified to query the new trajectory data. In region $u_2$, $\phi_1$ exhibits a saddle point, $\phi_2$ moves downward, and $\phi_3$ moves upward. In contrast, in region $u_1$, all trajectories move from right to left. Differentiating the predicted expressions is easier in region $u_2$ than in region $u_1$.
  • Figure 3: On the selected data (Strogatz dataset with $n=1$), quartiles of NMSE and $R^2$ scores of the learning algorithms.
  • Figure 4: Implemented 4th order Runge Kutter method.
  • Figure 5: The given set of best-predicted ODEs for Table \ref{['tab:diff-active']}.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Theorem : Existence and Uniqueness coddington1955theory