Table of Contents
Fetching ...

On the Scientific Method: The Role of Hypotheses and Involved Mathematics

Mario Milanese, Carlo Novara, Michele Taragna

TL;DR

The paper frames scientific inference from data as inferring a law $y=f_o(u)$ from noisy measurements, highlighting that data alone cannot guarantee a reliable approximation without hypotheses and mathematical accuracy notions. It develops a rigorous BA-based framework built around the Feasible Function Set and analyzes two broad estimator families: Parametric Probabilistic (PP) and Set Membership (SM), detailing their theoretical accuracy and falsification properties. It shows that SM estimators can yield concrete, norm-based optimality guarantees under Lipschitz/differentiability assumptions, while PP estimators are limited to restrictive linear or simple parametric forms. The work introduces falsification criteria inspired by Popper and culminates with Parametric Set Membership (PSM) estimators that fuse physics-based priors with data-driven SM bounds, offering improved accuracy and a structured path for hypothesis refinement in dynamical and complex systems.

Abstract

The paper investigates the role of data, hypotheses and mathematical methods that can be used in the discovery of a law y=fo(u), relating variables u and y of a physical phenomenon, making use of experimental measurements of such variables. Since the exact knowledge of the function fo cannot be expected, the problem of deriving approximate functions giving a small approximation error, measured by some function norm, is discussed. The main contributions of the paper are summarized as follows. At first, it is proven that deriving a reliable approximation, i.e., having a finite error, is not possible using measured data only. Thus, for deriving a reliable approximation, hypotheses on the function fo and on the disturbances corrupting the measurements must be introduced. Second, necessary and sufficient conditions for deriving a reliable approximation are provided. If such conditions are satisfied, suitable accuracy properties of the approximation can be defined, called theoretical properties. Third, it is shown that it is not possible to verify the conditions necessary for deriving a reliable approximation, but it is possible to verify that hypotheses on fo and on the disturbances are falsified by experimental measurements, showing that no function and disturbances satisfying the given hypotheses exist, able to reproduce the measurements (this is called falsification property). The above properties are then discussed for hypotheses belonging to the following classes: Parametric Probabilistic, where fo is assumed to be a function depending on a vector p and the disturbances are assumed to be stochastic variables; Set Membership class, where fo is assumed to be a bounded smooth function and the disturbances are assumed to be bounded variables; Parametric Set Membership class, able to integrate Parametric Probabilistic hypotheses with Set Membership hypotheses.

On the Scientific Method: The Role of Hypotheses and Involved Mathematics

TL;DR

The paper frames scientific inference from data as inferring a law from noisy measurements, highlighting that data alone cannot guarantee a reliable approximation without hypotheses and mathematical accuracy notions. It develops a rigorous BA-based framework built around the Feasible Function Set and analyzes two broad estimator families: Parametric Probabilistic (PP) and Set Membership (SM), detailing their theoretical accuracy and falsification properties. It shows that SM estimators can yield concrete, norm-based optimality guarantees under Lipschitz/differentiability assumptions, while PP estimators are limited to restrictive linear or simple parametric forms. The work introduces falsification criteria inspired by Popper and culminates with Parametric Set Membership (PSM) estimators that fuse physics-based priors with data-driven SM bounds, offering improved accuracy and a structured path for hypothesis refinement in dynamical and complex systems.

Abstract

The paper investigates the role of data, hypotheses and mathematical methods that can be used in the discovery of a law y=fo(u), relating variables u and y of a physical phenomenon, making use of experimental measurements of such variables. Since the exact knowledge of the function fo cannot be expected, the problem of deriving approximate functions giving a small approximation error, measured by some function norm, is discussed. The main contributions of the paper are summarized as follows. At first, it is proven that deriving a reliable approximation, i.e., having a finite error, is not possible using measured data only. Thus, for deriving a reliable approximation, hypotheses on the function fo and on the disturbances corrupting the measurements must be introduced. Second, necessary and sufficient conditions for deriving a reliable approximation are provided. If such conditions are satisfied, suitable accuracy properties of the approximation can be defined, called theoretical properties. Third, it is shown that it is not possible to verify the conditions necessary for deriving a reliable approximation, but it is possible to verify that hypotheses on fo and on the disturbances are falsified by experimental measurements, showing that no function and disturbances satisfying the given hypotheses exist, able to reproduce the measurements (this is called falsification property). The above properties are then discussed for hypotheses belonging to the following classes: Parametric Probabilistic, where fo is assumed to be a function depending on a vector p and the disturbances are assumed to be stochastic variables; Set Membership class, where fo is assumed to be a bounded smooth function and the disturbances are assumed to be bounded variables; Parametric Set Membership class, able to integrate Parametric Probabilistic hypotheses with Set Membership hypotheses.
Paper Structure (9 sections, 5 theorems, 18 equations, 1 figure)

This paper contains 9 sections, 5 theorems, 18 equations, 1 figure.

Key Result

Lemma 1

No reliable estimators $\widehat{\phi}\in\Phi\left(D_{\mathit{ex}}\right)$ exist, with $q=1,\ldots,\infty$, whatever large the number $M$ of data set $D_{\mathit{ex}}$ is.

Figures (1)

  • Figure 1: Falsification curve, Falsified and Unfalsified Regions

Theorems & Definitions (5)

  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Theorem 3