Table of Contents
Fetching ...

When is a System Discoverable from Data? Discovery Requires Chaos

Zakhar Shumaylov, Peter Zaika, Philipp Scholl, Gitta Kutyniok, Lior Horesh, Carola-Bibiane Schönlieb

TL;DR

This work reframes the data-driven discovery of dynamical systems as an identifiability problem within function spaces, arguing that chaos is essential for unique recoverability from trajectory data. It formalizes a dual route to discoverability: in $C^0$ for systems chaotic on the whole domain, and in $C^\omega$ for chaotic attractors with attractor dimension $\dim_H(\mathcal A)>d-1$, exemplifying the Lorenz system's analytic discoverability. It further shows that analytic discoverability is blocked by analytic first integrals, while conservation laws can restore discoverability for non-chaotic systems when used with appropriate geometric conditions. The results bridge dynamical systems theory, analytic geometry, and data-driven discovery, explaining why purely data-driven methods excel in chaotic domains (e.g., weather) but face obstacles in stable engineering contexts, and they advocate for physics-informed priors to obtain robust well-posedness. Overall, the paper provides a rigorous foundation for when trajectory-based discovery is possible and highlights the need for hybrid approaches that integrate prior physical knowledge with data-driven methods. The insights have practical implications for choosing modeling strategies across scientific domains and for the development of reliable discovery algorithms.

Abstract

The deep learning revolution has spurred a rise in advances of using AI in sciences. Within physical sciences the main focus has been on discovery of dynamical systems from observational data. Yet the reliability of learned surrogates and symbolic models is often undermined by the fundamental problem of non-uniqueness. The resulting models may fit the available data perfectly, but lack genuine predictive power. This raises the question: under what conditions can the systems governing equations be uniquely identified from a finite set of observations? We show, counter-intuitively, that chaos, typically associated with unpredictability, is crucial for ensuring a system is discoverable in the space of continuous or analytic functions. The prevalence of chaotic systems in benchmark datasets may have inadvertently obscured this fundamental limitation. More concretely, we show that systems chaotic on their entire domain are discoverable from a single trajectory within the space of continuous functions, and systems chaotic on a strange attractor are analytically discoverable under a geometric condition on the attractor. As a consequence, we demonstrate for the first time that the classical Lorenz system is analytically discoverable. Moreover, we establish that analytic discoverability is impossible in the presence of first integrals, common in real-world systems. These findings help explain the success of data-driven methods in inherently chaotic domains like weather forecasting, while revealing a significant challenge for engineering applications like digital twins, where stable, predictable behavior is desired. For these non-chaotic systems, we find that while trajectory data alone is insufficient, certain prior physical knowledge can help ensure discoverability. These findings warrant a critical re-evaluation of the fundamental assumptions underpinning purely data-driven discovery.

When is a System Discoverable from Data? Discovery Requires Chaos

TL;DR

This work reframes the data-driven discovery of dynamical systems as an identifiability problem within function spaces, arguing that chaos is essential for unique recoverability from trajectory data. It formalizes a dual route to discoverability: in for systems chaotic on the whole domain, and in for chaotic attractors with attractor dimension , exemplifying the Lorenz system's analytic discoverability. It further shows that analytic discoverability is blocked by analytic first integrals, while conservation laws can restore discoverability for non-chaotic systems when used with appropriate geometric conditions. The results bridge dynamical systems theory, analytic geometry, and data-driven discovery, explaining why purely data-driven methods excel in chaotic domains (e.g., weather) but face obstacles in stable engineering contexts, and they advocate for physics-informed priors to obtain robust well-posedness. Overall, the paper provides a rigorous foundation for when trajectory-based discovery is possible and highlights the need for hybrid approaches that integrate prior physical knowledge with data-driven methods. The insights have practical implications for choosing modeling strategies across scientific domains and for the development of reliable discovery algorithms.

Abstract

The deep learning revolution has spurred a rise in advances of using AI in sciences. Within physical sciences the main focus has been on discovery of dynamical systems from observational data. Yet the reliability of learned surrogates and symbolic models is often undermined by the fundamental problem of non-uniqueness. The resulting models may fit the available data perfectly, but lack genuine predictive power. This raises the question: under what conditions can the systems governing equations be uniquely identified from a finite set of observations? We show, counter-intuitively, that chaos, typically associated with unpredictability, is crucial for ensuring a system is discoverable in the space of continuous or analytic functions. The prevalence of chaotic systems in benchmark datasets may have inadvertently obscured this fundamental limitation. More concretely, we show that systems chaotic on their entire domain are discoverable from a single trajectory within the space of continuous functions, and systems chaotic on a strange attractor are analytically discoverable under a geometric condition on the attractor. As a consequence, we demonstrate for the first time that the classical Lorenz system is analytically discoverable. Moreover, we establish that analytic discoverability is impossible in the presence of first integrals, common in real-world systems. These findings help explain the success of data-driven methods in inherently chaotic domains like weather forecasting, while revealing a significant challenge for engineering applications like digital twins, where stable, predictable behavior is desired. For these non-chaotic systems, we find that while trajectory data alone is insufficient, certain prior physical knowledge can help ensure discoverability. These findings warrant a critical re-evaluation of the fundamental assumptions underpinning purely data-driven discovery.

Paper Structure

This paper contains 33 sections, 32 theorems, 44 equations, 5 figures.

Key Result

Theorem 1

Figures (5)

  • Figure 1: Model ambiguity in non-chaotic systems versus uniqueness in chaotic systems. (Left) For non-chaotic systems like the integrable Lorenz model, different mathematical models can generate identical dynamics, precluding unique discovery (\ref{['thm:first_integral']}). (Right) Conversely, the dynamics of a chaotic system like the Lorenz attractor uniquely specify the underlying model, making it discoverable (\ref{['cor:lorenz']}).
  • Figure 2: An illustration of how system dynamics affect model discovery from data. The regular motion of the simple pendulum explores a limited portion of its state space, which is insufficient to uniquely determine its governing equations. In contrast, the chaotic double pendulum explores its domain more densely, enabling easier discovery of its governing equations from trajectory data. Inspired by https://github.com/profConradi.
  • Figure 3: Examples of sets of uniqueness and non-uniqueness for various function spaces. Columns differentiate between function spaces: Linear (qiu2022identifiabilitycasolo2025identifiabilitychallengessparselinear), Real Analytic (\ref{['thm:open-condition-for-uniqueness-of-analytic-functions']}), and Continuous (\ref{['prop:cont_uniq']}). Being a set of uniqueness for a function space implies a function, zero on that set, is zero everywhere.
  • Figure 4: Illustration of finite continuous discoverability being equivalent to a decomposition into cells, on each of which the flow is topologically transitive. In-decomposable systems require $1$ trajectory for discovery, while decomposable ones require $1$ for each cell.
  • Figure 5: Illustration of the main argument behind \ref{['thm:first_integral']}. An analytic first integral foliates the underlying space into analytic surfaces, to which each trajectory is restricted. As each analytic surface is a zero set of an analytic first integral, it is not a set of uniqueness, and trajectories can evolve arbitrarily between the leaves of the foliation, while still preserving the first integral, leading to non-discoverability.

Theorems & Definitions (80)

  • Theorem : Informal
  • Corollary
  • Theorem : Informal
  • Example 1.1
  • Definition 2.1: Uniqueness/Identifiability
  • Definition 2.2: $n$ Discoverability
  • Definition 2.4: Topological Transitivity
  • Definition 2.5: Trajectory
  • Lemma 2.6: Continuous Birkhoff's Lemma. Proof in \ref{['sec:BirkProof']}
  • Definition 2.7: Invariant set
  • ...and 70 more