Discovering equations from data: symbolic regression in dynamical systems
Beatriz R. Brum, Luiza Lober, Isolde Previdelli, Francisco A. Rodrigues
TL;DR
This work surveys the landscape of symbolic regression (SR) methods for uncovering governing equations from data and analyzes identifiability challenges in dynamical systems. It benchmarks six SR algorithms—GPLearn, AI-Feynman, PySINDy, PySR, PyKAN, and ODEFormer—across nine dynamical systems, including chaotic, oscillatory, predator–prey, and epidemiological models, with PySR delivering the most robust and accurate structural recovery. The results show that several methods can recover governing forms with high fidelity, though performance varies with noise, parameter choices, and system dimensionality; PySR generally dominates in both structural and predictive accuracy. The study underscores the potential of SR for real-world equation discovery while outlining practical limitations, such as identifiability, NP-hardness, and sensitivity to data quality, and it calls for expanded benchmarks and robust noise-handling strategies.
Abstract
The process of discovering equations from data lies at the heart of physics and in many other areas of research, including mathematical ecology and epidemiology. Recently, machine learning methods known as symbolic regression emerged as a way to automate this task. This study presents an overview of the current literature on symbolic regression, while also comparing the efficiency of five state-of-the-art methods in recovering the governing equations from nine processes, including chaotic dynamics and epidemic models. Benchmark results demonstrate the PySR method as the most suitable for inferring equations, with some estimates being indistinguishable from the original analytical forms. These results highlight the potential of symbolic regression as a robust tool for inferring and modeling real-world phenomena.
