A Perspective on Symbolic Machine Learning in Physical Sciences
Nour Makke, Sanjay Chawla
TL;DR
The paper addresses the interpretability bottleneck of ML in physical sciences and argues for parallel development of symbolic ML alongside numerical approaches. It outlines symbolic regression as a key method to infer analytical models from data, exemplified by expressions such as $f(x_1,x_2)=ax_1-x_2$ and by using a unary-binary expression-tree with prefix notation. The authors discuss two exploratory paths for symbolic ML and highlight recent real-data applications that recover known physics forms like Tsallis distributions and the Lund string model, while noting current limitations. They advocate integrating symbolic ML with deep learning to accelerate theory-driven discovery by producing interpretable, generalizable models that maintain a close theory–experiment dialogue.
Abstract
Machine learning is rapidly making its pathway across all of the natural sciences, including physical sciences. The rate at which ML is impacting non-scientific disciplines is incomparable to that in the physical sciences. This is partly due to the uninterpretable nature of deep neural networks. Symbolic machine learning stands as an equal and complementary partner to numerical machine learning in speeding up scientific discovery in physics. This perspective discusses the main differences between the ML and scientific approaches. It stresses the need to develop and apply symbolic machine learning to physics problems equally, in parallel to numerical machine learning, because of the dual nature of physics research.
