Physics-Guided Foundation Model for Scientific Discovery: An Application to Aquatic Science
Runlong Yu, Chonghao Qiu, Robert Ladwig, Paul Hanson, Yiqun Xie, Xiaowei Jia
TL;DR
This paper tackles the challenge of applying foundation models to complex scientific problems by integrating physics knowledge into a two-stage PGML framework. It introduces PGFM, which pre-trains on a physics-grounded simulated environmental system to learn broadly useful feature interactions and then fine-tunes on real observations with explicit energy and mass conservation penalties. The key contributions include an evolution-based feature selection mechanism, a (n+1)-PGFM pre-training instantiation, and physics-driven losses for both energy and DO mass balance, all demonstrated on predicting lake water temperature and DO dynamics with improved accuracy and physical consistency. The findings show that physics-guided pre-training plus physical regularization improves generalization in data-scarce, multi-task environments and is readily adaptable to other domains that couple physics-based simulators with machine learning.
Abstract
Physics-guided machine learning (PGML) has become a prevalent approach in studying scientific systems due to its ability to integrate scientific theories for enhancing machine learning (ML) models. However, most PGML approaches are tailored to isolated and relatively simple tasks, which limits their applicability to complex systems involving multiple interacting processes and numerous influencing features. In this paper, we propose a \textit{\textbf{P}hysics-\textbf{G}uided \textbf{F}oundation \textbf{M}odel (\textbf{PGFM})} that combines pre-trained ML models and physics-based models and leverages their complementary strengths to improve the modeling of multiple coupled processes. To effectively conduct pre-training, we construct a simulated environmental system that encompasses a wide range of influencing features and various simulated variables generated by physics-based models. The model is pre-trained in this system to adaptively select important feature interactions guided by multi-task objectives. We then fine-tune the model for each specific task using true observations, while maintaining consistency with established physical theories, such as the principles of mass and energy conservation. We demonstrate the effectiveness of this methodology in modeling water temperature and dissolved oxygen dynamics in real-world lakes. The proposed PGFM is also broadly applicable to a range of scientific fields where physics-based models are being used.
