When is a System Discoverable from Data? Discovery Requires Chaos
Zakhar Shumaylov, Peter Zaika, Philipp Scholl, Gitta Kutyniok, Lior Horesh, Carola-Bibiane Schönlieb
TL;DR
This work reframes the data-driven discovery of dynamical systems as an identifiability problem within function spaces, arguing that chaos is essential for unique recoverability from trajectory data. It formalizes a dual route to discoverability: in $C^0$ for systems chaotic on the whole domain, and in $C^\omega$ for chaotic attractors with attractor dimension $\dim_H(\mathcal A)>d-1$, exemplifying the Lorenz system's analytic discoverability. It further shows that analytic discoverability is blocked by analytic first integrals, while conservation laws can restore discoverability for non-chaotic systems when used with appropriate geometric conditions. The results bridge dynamical systems theory, analytic geometry, and data-driven discovery, explaining why purely data-driven methods excel in chaotic domains (e.g., weather) but face obstacles in stable engineering contexts, and they advocate for physics-informed priors to obtain robust well-posedness. Overall, the paper provides a rigorous foundation for when trajectory-based discovery is possible and highlights the need for hybrid approaches that integrate prior physical knowledge with data-driven methods. The insights have practical implications for choosing modeling strategies across scientific domains and for the development of reliable discovery algorithms.
Abstract
The deep learning revolution has spurred a rise in advances of using AI in sciences. Within physical sciences the main focus has been on discovery of dynamical systems from observational data. Yet the reliability of learned surrogates and symbolic models is often undermined by the fundamental problem of non-uniqueness. The resulting models may fit the available data perfectly, but lack genuine predictive power. This raises the question: under what conditions can the systems governing equations be uniquely identified from a finite set of observations? We show, counter-intuitively, that chaos, typically associated with unpredictability, is crucial for ensuring a system is discoverable in the space of continuous or analytic functions. The prevalence of chaotic systems in benchmark datasets may have inadvertently obscured this fundamental limitation. More concretely, we show that systems chaotic on their entire domain are discoverable from a single trajectory within the space of continuous functions, and systems chaotic on a strange attractor are analytically discoverable under a geometric condition on the attractor. As a consequence, we demonstrate for the first time that the classical Lorenz system is analytically discoverable. Moreover, we establish that analytic discoverability is impossible in the presence of first integrals, common in real-world systems. These findings help explain the success of data-driven methods in inherently chaotic domains like weather forecasting, while revealing a significant challenge for engineering applications like digital twins, where stable, predictable behavior is desired. For these non-chaotic systems, we find that while trajectory data alone is insufficient, certain prior physical knowledge can help ensure discoverability. These findings warrant a critical re-evaluation of the fundamental assumptions underpinning purely data-driven discovery.
