Understanding Charge Radii with Machine Learning: Discovering Physics Expressions
B. Maheshwari, P. Van Isacker
TL;DR
This work tackles the challenge of accurately predicting nuclear charge radii while preserving interpretability by coupling high-accuracy numerical regression (LGBM and Gaussian Process Regression) with symbolic regression to extract analytic physics expressions. By engineering physically meaningful features such as $A^{1/3}$, $Z^{1/3}$, $BEA$, isospin $I$, Casten factor $CF$, and pairing gap $P$, and enforcing four-fold cross-validation, the authors obtain robust, extrapolatable predictions across the nuclear chart. The symbolic regression stage distills the learned models into compact formulas that echo liquid-drop-like terms, uncovering the dominance of $A^{1/3}$ and $Z^{1/3}$ scaling and revealing corrections tied to binding energy, isospin, and pairing effects. This hybrid data-driven approach provides interpretable, physics-informed expressions with quantified uncertainties, offering a blueprint for discovery of governing relations in nuclear structure and beyond.
Abstract
We introduce a robust, interpretable machine learning (ML) framework that combines numerical regression for high-accuracy predictions with symbolic regression to uncover the underlying physics. This hybrid approach effciently derives analytical expressions by leveraging the smoothed predictions of optimized ML models, a significant acceleration over direct symbolic regression on raw experimental data. We apply this framework, as an example, to nuclear charge radii across the nuclear chart, notably including light nuclei that are often excluded from such studies. We employ Light Gradient Boosting Machine (LGBM) and Gaussian Process Regression (GPR) models to map correlations between charge radii and key physical features: mass $A^{1/3}$ and proton number $Z^{1/3}$ dependencies, total binding energy, and for the first time, the pairing gap. Our models are rigorously trained using four-fold cross-validation with automated hyperparameter optimization, ensuring robustness and generalizability, which is critical for the typically small and skewed datasets in nuclear physics. Finally, we distill the knowledge from the initial LGBM and GPR models into simplified, interpretable mathematical expressions via symbolic regression, white-boxing these ML models. The derived formulas provide physical insights comparable to traditional many-body models and demonstrate a powerful pathway for physics expression discovery guided by ML.
