Table of Contents
Fetching ...

Data-driven Progressive Discovery of Physical Laws

Mingkun Xia, Weiwei Zhang

Abstract

Symbolic regression is a powerful tool for knowledge discovery, enabling the extraction of interpretable mathematical expressions directly from data. However, conventional symbolic discovery typically follows an end-to-end, "one-step" process, which often generates lengthy and physically meaningless expressions when dealing with real physical systems, leading to poor model generalization. This limitation fundamentally stems from its deviation from the basic path of scientific discovery: physical laws do not exist in a single form but follow a hierarchical and progressive pattern from simplicity to complexity. Motivated by this principle, we propose Chain of Symbolic Regression (CoSR), a novel framework that models the discovery of physical laws as a chain of symbolic knowledge. This knowledge chain is formed by progressively combining multiple knowledge units with clear physical meanings along a specific logic, ultimately enabling the precise discovery of the underlying physical laws from data. CoSR fully recapitulates the progressive discovery path from Kepler's third law to the law of universal gravitation in classical mechanics, and is applied to three types of problems: turbulent Rayleigh-Benard convection, viscous flows in a circular pipe, and laser-metal interaction, demonstrating its ability to improve classical scaling theories. Finally, CoSR showcases its capability to discover new knowledge in the complex engineering problem of aerodynamic coefficients scaling for different aircraft.

Data-driven Progressive Discovery of Physical Laws

Abstract

Symbolic regression is a powerful tool for knowledge discovery, enabling the extraction of interpretable mathematical expressions directly from data. However, conventional symbolic discovery typically follows an end-to-end, "one-step" process, which often generates lengthy and physically meaningless expressions when dealing with real physical systems, leading to poor model generalization. This limitation fundamentally stems from its deviation from the basic path of scientific discovery: physical laws do not exist in a single form but follow a hierarchical and progressive pattern from simplicity to complexity. Motivated by this principle, we propose Chain of Symbolic Regression (CoSR), a novel framework that models the discovery of physical laws as a chain of symbolic knowledge. This knowledge chain is formed by progressively combining multiple knowledge units with clear physical meanings along a specific logic, ultimately enabling the precise discovery of the underlying physical laws from data. CoSR fully recapitulates the progressive discovery path from Kepler's third law to the law of universal gravitation in classical mechanics, and is applied to three types of problems: turbulent Rayleigh-Benard convection, viscous flows in a circular pipe, and laser-metal interaction, demonstrating its ability to improve classical scaling theories. Finally, CoSR showcases its capability to discover new knowledge in the complex engineering problem of aerodynamic coefficients scaling for different aircraft.
Paper Structure (20 sections, 15 equations, 7 figures)

This paper contains 20 sections, 15 equations, 7 figures.

Figures (7)

  • Figure 1: Schematic illustration of Chain of Symbolic Regression framework. This figure systematically depicts the complete workflow of the CoSR framework, which constructs physical knowledge chains through a series of progressive steps. Invariance Learning: Dimensionality reduction of the parameter space via nondimensionalization based on the Buckingham $\pi$ theorem, followed by implicit symbolic regression to uncover intrinsic constraint relationships within the data. Multi-layered Compression: Hierarchical symbolic regression through multi-level nested functions to extract progressively knowledge structures. Scaling Transformation: Refinement and simplification of expressions via transformation techniques (top: 'curves to lines' to reduce complexity; bottom: 'multiple lines to one' to unify scaling laws). These progressive discovery modes operate in concert through a physics-guided dynamic switching mechanism, culminating in a knowledge chain that achieves a balance among formal parsimony, physical interpretability, and predictive accuracy.
  • Figure 2: Progressive discovery pathway of the law of universal gravitation.(a) Problem description: Celestial systems are categorized into two prototypical models based on the mass ratio between the central body and its orbiting companion: planet-star systems (where $M \gg m$ ) and binary star systems (where $M \sim m$ in order of magnitude). The relationship to be uncovered by CoSR is $F = f(M, m, T, R)$. (b) Progressive symbolic network chain: When applied to Solar System data, the framework extracts dynamical relationships through hierarchical discovery and identifies Kepler's third law via implicit discovery. In exoplanet and binary star systems, it automatically identifies the reduced mass, yielding the corresponding dynamical relationships with the reduced mass and a generalized formulation of Kepler's third law. These two discovery trajectories are ultimately synthesized to obtain the universal law of gravitation, thereby demonstrating the unification of physical laws.
  • Figure 3: Progressive discovery pathway for turbulent Rayleigh-Bénard convection.(a) Problem description: Schematic illustration of the Rayleigh-Bénard convection system, along with the input and output parameters. (b) Progressive symbolic network chain: Starting from six raw parameters, hierarchical discovery progressively identifies the Prandtl number, the Grashof number and ultimately the Rayleigh number through a structured pathway. (c) Refinement via scaling transformation: The conventional nonlinear Nu-Ra relationship is transformed and simplified into a linear scaling law through correction transformations, revealing deeper physical insights.
  • Figure 4: Progressive discovery pathway for rough-wall pipe flow.(a) Problem description: Schematic illustration of rough-wall pipe flow, along with the input and output parameters. (b) Progressive symbolic network chain: Structured discovery pathway from raw parameters through dimensional analysis to the optimal dimensionless combination $(Re, \varepsilon/d)$. (c, d) Refinement via scaling transformation: Through transformations, the conventional $C_f$--$Re$--$\varepsilon/d$ relationship (typically requiring complex piecewise power-law descriptions) is reconstructed into a parsimonious and unified scaling form. This approach not only reproduces the classic Goldenfeld's scaling law but further yields a scaling form with superior data collapse. Based on this formulation, reconstructing $C_f$ using polynomials achieves improved prediction accuracy, particularly in the transitional turbulent regime.
  • Figure 5: Progressive discovery pathway for laser-metal interaction.(a) Problem description: Schematic illustration of laser-metal interaction, along with the input and output parameters. (b) Progressive symbolic network chain: Complete discovery pathway from seven raw parameters to key dimensionless groups. (c) Discovery of key parameters: Systematic presentation of two core dimensionless parameters automatically discovered by the framework: the keyhole number $Ke$ and the material characteristic number $X$. (d) Dominant parameter extraction: Combination of $Ke$ and $X$ to discover the modified keyhole number $Ke^*$. (e) Absolute prediction error distributions for the two keyhole numbers: Comparison demonstrating the superior predictive accuracy of the modified keyhole number $Ke^*$ over the classical $Ke$ across three materials, with particularly notable improvement for Al6061.
  • ...and 2 more figures