A Neural Symbolic Model for Space Physics
Jie Ying, Haowei Lin, Chao Yue, Yajie Chen, Chao Xiao, Quanqi Shi, Yitao Liang, Shing-Tung Yau, Yuan Zhou, Jianzhu Ma
TL;DR
PhyE2E presents a neural-symbolic framework for automated discovery of physical laws from observational data by integrating LLM-synthesized physics formulas, a transformer-based end-to-end formula regression, and a Hessian-guided divide-and-conquer decomposition, followed by MCTS and GP refinement. The method achieves state-of-the-art symbolic accuracy and unit-consistency on both synthetic AI-Feynman datasets and diverse real space-physics applications, including sunspot numbers, plasma pressure, solar differential rotation, emission-line contributions, and lunar-tide signals, often with substantially simpler, interpretable formulas. A key advance is the explicit integration of physical priors, especially units, into the model and its outputs, enabling unit-consistent formulas without heavy retuning. The work demonstrates robust generalization to long-term solar cycles and multiple space-physics phenomena, and provides data and code to enable broader application of neural-symbolic symbolic regression to scientific discovery.
Abstract
In this study, we unveil a new AI model, termed PhyE2E, to discover physical formulas through symbolic regression. PhyE2E simplifies symbolic regression by decomposing it into sub-problems using the second-order derivatives of an oracle neural network, and employs a transformer model to translate data into symbolic formulas in an end-to-end manner. The resulting formulas are refined through Monte-Carlo Tree Search and Genetic Programming. We leverage a large language model to synthesize extensive symbolic expressions resembling real physics, and train the model to recover these formulas directly from data. A comprehensive evaluation reveals that PhyE2E outperforms existing state-of-the-art approaches, delivering superior symbolic accuracy, precision in data fitting, and consistency in physical units. We deployed PhyE2E to five applications in space physics, including the prediction of sunspot numbers, solar rotational angular velocity, emission line contribution functions, near-Earth plasma pressure, and lunar-tide plasma signals. The physical formulas generated by AI demonstrate a high degree of accuracy in fitting the experimental data from satellites and astronomical telescopes. We have successfully upgraded the formula proposed by NASA in 1993 regarding solar activity, and for the first time, provided the explanations for the long cycle of solar activity in an explicit form. We also found that the decay of near-Earth plasma pressure is proportional to r^2 to Earth, where subsequent mathematical derivations are consistent with satellite data from another independent study. Moreover, we found physical formulas that can describe the relationships between emission lines in the extreme ultraviolet spectrum of the Sun, temperatures, electron densities, and magnetic fields. The formula obtained is consistent with the properties that physicists had previously hypothesized it should possess.
