Table of Contents
Fetching ...

Integrating Scientific Knowledge with Machine Learning for Engineering and Environmental Systems

Jared Willard, Xiaowei Jia, Shaoming Xu, Michael Steinbach, Vipin Kumar

TL;DR

The paper surveys how integrating physical knowledge with data-driven approaches can address limitations of purely mechanistic or purely data-driven methods in environmental and engineering systems. It categorizes objectives and methods, presents a taxonomy of physics-ML techniques, and discusses cross-disciplinary opportunities and gaps. It argues that physics-guided loss, initialization, architecture, and hybrid modeling can improve accuracy, data efficiency, and interpretability, with applications from PDE solving to equation discovery and uncertainty quantification. The work highlights cross-fertilization opportunities and provides guidance for future research and broader adoption.

Abstract

There is a growing consensus that solutions to complex science and engineering problems require novel methodologies that are able to integrate traditional physics-based modeling approaches with state-of-the-art machine learning (ML) techniques. This paper provides a structured overview of such techniques. Application-centric objective areas for which these approaches have been applied are summarized, and then classes of methodologies used to construct physics-guided ML models and hybrid physics-ML frameworks are described. We then provide a taxonomy of these existing techniques, which uncovers knowledge gaps and potential crossovers of methods between disciplines that can serve as ideas for future research.

Integrating Scientific Knowledge with Machine Learning for Engineering and Environmental Systems

TL;DR

The paper surveys how integrating physical knowledge with data-driven approaches can address limitations of purely mechanistic or purely data-driven methods in environmental and engineering systems. It categorizes objectives and methods, presents a taxonomy of physics-ML techniques, and discusses cross-disciplinary opportunities and gaps. It argues that physics-guided loss, initialization, architecture, and hybrid modeling can improve accuracy, data efficiency, and interpretability, with applications from PDE solving to equation discovery and uncertainty quantification. The work highlights cross-fertilization opportunities and provides guidance for future research and broader adoption.

Abstract

There is a growing consensus that solutions to complex science and engineering problems require novel methodologies that are able to integrate traditional physics-based modeling approaches with state-of-the-art machine learning (ML) techniques. This paper provides a structured overview of such techniques. Application-centric objective areas for which these approaches have been applied are summarized, and then classes of methodologies used to construct physics-guided ML models and hybrid physics-ML frameworks are described. We then provide a taxonomy of these existing techniques, which uncovers knowledge gaps and potential crossovers of methods between disciplines that can serve as ideas for future research.

Paper Structure

This paper contains 37 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: A generic scientific problem in engineering, where $\Vec{x}_t$ are the dynamic inputs in time, $\Vec{s}$ is the set of static characteristics or parameters of the system, and $F()$ is the model producing target variable $\Vec{y}_t$. $\Vec{x}_t$ and $\Vec{y}_t$ can also have spatial dimensions.
  • Figure 2: The Physics-Guided Recurrent Neural Network (PGRNN) model demonstrated in Jia et al. jia2020physics is an example of a physics-guided loss function allowing physical knowledge to be incorporated into the ML model. They include the standard RNN flow (black arrows) and the energy flow (blue arrows) in the recurrent process. Here $U_T$ represents the thermal energy of the lake at time $T$, and both the energy output and temperature output $y_T$ are used in calculating the loss function value. This enables the PGRNN to predict lake temperature without violating energy constraints. A detailed description of the loss function equation (Equation \ref{['eq1']}) can be found in Section \ref{['method:loss_func']}.
  • Figure 3: An illustration of the concept of residual modeling where an ML model $f_{ML}$ is trained to model the error made by the physics-based model $f_{PHY}$. Final predictions are then the sum of the predictions made by $f_{PHY}$ and the residual modeled by $f_{ML}$. Processes shown in red and blue are training and testing respectively. Figure adapted from forssell1997combining.
  • Figure 4: Diagram of a hybrid physics-ML model which accepts the output of a physical model as input to an ML model (Figure adapted from Karpatne et al. karpatne2017physics). In the diagram, the physics-based model converts the input drivers $D$ to simulated outputs $Y_{PHY}$. Then, the hybrid physics-ML model $f_{HPD}$ jointly uses the input drivers $D$ and the simulated outputs $Y_{PHY}$ to make the final prediction $Y_{pred}$