Knowledge-guided Machine Learning: Current Trends and Future Prospects
Anuj Karpatne, Xiaowei Jia, Vipin Kumar
TL;DR
This survey defines knowledge-guided machine learning (KGML) as a framework that blends scientific knowledge with data to overcome limitations of purely process-based or purely data-driven models. It presents a three-dimensional taxonomy—type of knowledge (perfect to partial), form of integration (ML-centric to process-centric), and method of incorporation (learning, architecture, pre-training)—to categorize KGML work. The paper then details three major KGML methods and surveys four environmental-use cases (forward, inverse, generative, and downscaling), with emphasis on forward modeling via surrogate models and improved forward modeling, inverse problem approaches, and data-driven generation and resolution enhancement. It also discusses the rise of foundation models in environmental science and identifies future directions such as causal reasoning, uncertainty quantification, and scalable, interpretable foundation-model–driven KGML systems, aiming to better generalize across domains and data regimes. Overall, KGML is positioned as a promising pathway to more robust, scientifically consistent, and explainable ML for environmental modeling and discovery.
Abstract
This paper presents an overview of scientific modeling and discusses the complementary strengths and weaknesses of ML methods for scientific modeling in comparison to process-based models. It also provides an introduction to the current state of research in the emerging field of scientific knowledge-guided machine learning (KGML) that aims to use both scientific knowledge and data in ML frameworks to achieve better generalizability, scientific consistency, and explainability of results. We discuss different facets of KGML research in terms of the type of scientific knowledge used, the form of knowledge-ML integration explored, and the method for incorporating scientific knowledge in ML. We also discuss some of the common categories of use cases in environmental sciences where KGML methods are being developed, using illustrative examples in each category.
