Physics-informed features in supervised machine learning
Margherita Lampani, Sabrina Guastavino, Michele Piana, Federico Benvenuto
TL;DR
This work addresses the limited interpretability of traditional feature-based supervised learning in scientific contexts by introducing physics-informed feature maps that produce dimensionally homogeneous representations within an RKHS framework. By defining a forward operator $A$ through a physics-informed map $\phi$, the authors recast learning as a regularized inverse problem and establish a theoretical link between the forward model and RKHS solutions via $\hat{f}_{\lambda} = A^{\dagger} \hat{g}_{\lambda}$. Through synthetic experiments on fluid dynamics (Bernoulli), pulsar magnetic dissipation, and binary-system classification, the method demonstrates improvements in regression and classification performance and recovers or identifies underlying physical relationships. A real-data application to solar flare forecasting shows SPIFs still provide predictive gains and highlights $PIF_2 = \Phi I$ as a key descriptor, suggesting magnetic helicity as a practical energy-distribution proxy. Overall, the physics-informed feature approach enhances explainability, supports mechanism discovery, and offers a pathway for discovering new physical equations within explainable ML.
Abstract
Supervised machine learning involves approximating an unknown functional relationship from a limited dataset of features and corresponding labels. The classical approach to feature-based machine learning typically relies on applying linear regression to standardized features, without considering their physical meaning. This may limit model explainability, particularly in scientific applications. This study proposes a physics-informed approach to feature-based machine learning that constructs non-linear feature maps informed by physical laws and dimensional analysis. These maps enhance model interpretability and, when physical laws are unknown, allow for the identification of relevant mechanisms through feature ranking. The method aims to improve both predictive performance in regression tasks and classification skill scores by integrating domain knowledge into the learning process, while also enabling the potential discovery of new physical equations within the context of explainable machine learning.
