Introduction to Symbolic Regression in the Physical Sciences
Deaglan J. Bartlett, Harry Desmond, Pedro G. Ferreira, Gabriel Kronberger
TL;DR
This article introduces Symbolic Regression (SR) for the physical sciences, outlining its foundations, contrasts with fixed-structure regression, and the main uses in discovery, empirical modeling, and emulation. It surveys methodological considerations, including search-space design, operator sets, and complexity control, and discusses challenges such as scalability, noise robustness, and overfitting. The paper highlights emerging directions like incorporating symmetry and asymptotic constraints and integrating with foundation models, as well as key insights from the Royal Society meeting on advances, packages (e.g., PySR), and real-world applications. Overall, SR is presented as a principled, interpretable pathway for discovering physical laws and building fast, analytic surrogates that complement traditional simulations and data-driven methods.
Abstract
Symbolic regression (SR) has emerged as a powerful method for uncovering interpretable mathematical relationships from data, offering a novel route to both scientific discovery and efficient empirical modelling. This article introduces the Special Issue on Symbolic Regression for the Physical Sciences, motivated by the Royal Society discussion meeting held in April 2025. The contributions collected here span applications from automated equation discovery and emergent-phenomena modelling to the construction of compact emulators for computationally expensive simulations. The introductory review outlines the conceptual foundations of SR, contrasts it with conventional regression approaches, and surveys its main use cases in the physical sciences, including the derivation of effective theories, empirical functional forms and surrogate models. We summarise methodological considerations such as search-space design, operator selection, complexity control, feature selection, and integration with modern AI approaches. We also highlight ongoing challenges, including scalability, robustness to noise, overfitting and computational complexity. Finally we emphasise emerging directions, particularly the incorporation of symmetry constraints, asymptotic behaviour and other theoretical information. Taken together, the papers in this Special Issue illustrate the accelerating progress of SR and its growing relevance across the physical sciences.
