Table of Contents
Fetching ...

Interpreting and generalizing deep learning in physics-based problems with functional linear models

Amirhossein Arzani, Lingxiao Yuan, Pania Newell, Bei Wang

TL;DR

The work tackles interpretability and poor OOD generalization in physics-based deep learning by introducing generalized functional linear models as interpretable surrogates. These surrogates are linear sums of integral equations with kernels drawn from a library and learned via sparse regression, enabling post-hoc interpretation of a trained network or direct data-driven construction. Across solid mechanics, fluid dynamics, and transport problems, the interpretable surrogate achieves comparable training accuracy and often improved OOD generalization, while providing transparent kernel-based mappings between input functions and outputs. The framework supports a hybrid strategy where a neural network handles in-distribution predictions and the interpretable surrogate addresses extrapolation, with potential extensions to time-dependent problems and inverse problems, thereby enhancing trust and utility in scientific ML.

Abstract

Although deep learning has achieved remarkable success in various scientific machine learning applications, its opaque nature poses concerns regarding interpretability and generalization capabilities beyond the training data. Interpretability is crucial and often desired in modeling physical systems. Moreover, acquiring extensive datasets that encompass the entire range of input features is challenging in many physics-based learning tasks, leading to increased errors when encountering out-of-distribution (OOD) data. In this work, motivated by the field of functional data analysis (FDA), we propose generalized functional linear models as an interpretable surrogate for a trained deep learning model. We demonstrate that our model could be trained either based on a trained neural network (post-hoc interpretation) or directly from training data (interpretable operator learning). A library of generalized functional linear models with different kernel functions is considered and sparse regression is used to discover an interpretable surrogate model that could be analytically presented. We present test cases in solid mechanics, fluid mechanics, and transport. Our results demonstrate that our model can achieve comparable accuracy to deep learning and can improve OOD generalization while providing more transparency and interpretability. Our study underscores the significance of interpretable representation in scientific machine learning and showcases the potential of functional linear models as a tool for interpreting and generalizing deep learning.

Interpreting and generalizing deep learning in physics-based problems with functional linear models

TL;DR

The work tackles interpretability and poor OOD generalization in physics-based deep learning by introducing generalized functional linear models as interpretable surrogates. These surrogates are linear sums of integral equations with kernels drawn from a library and learned via sparse regression, enabling post-hoc interpretation of a trained network or direct data-driven construction. Across solid mechanics, fluid dynamics, and transport problems, the interpretable surrogate achieves comparable training accuracy and often improved OOD generalization, while providing transparent kernel-based mappings between input functions and outputs. The framework supports a hybrid strategy where a neural network handles in-distribution predictions and the interpretable surrogate addresses extrapolation, with potential extensions to time-dependent problems and inverse problems, thereby enhancing trust and utility in scientific ML.

Abstract

Although deep learning has achieved remarkable success in various scientific machine learning applications, its opaque nature poses concerns regarding interpretability and generalization capabilities beyond the training data. Interpretability is crucial and often desired in modeling physical systems. Moreover, acquiring extensive datasets that encompass the entire range of input features is challenging in many physics-based learning tasks, leading to increased errors when encountering out-of-distribution (OOD) data. In this work, motivated by the field of functional data analysis (FDA), we propose generalized functional linear models as an interpretable surrogate for a trained deep learning model. We demonstrate that our model could be trained either based on a trained neural network (post-hoc interpretation) or directly from training data (interpretable operator learning). A library of generalized functional linear models with different kernel functions is considered and sparse regression is used to discover an interpretable surrogate model that could be analytically presented. We present test cases in solid mechanics, fluid mechanics, and transport. Our results demonstrate that our model can achieve comparable accuracy to deep learning and can improve OOD generalization while providing more transparency and interpretability. Our study underscores the significance of interpretable representation in scientific machine learning and showcases the potential of functional linear models as a tool for interpreting and generalizing deep learning.
Paper Structure (25 sections, 17 equations, 9 figures, 7 tables)

This paper contains 25 sections, 17 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: An overview of the proposed framework. Given a trained neural network that maps an input function $\mathbf{f}(\mathbf{x})$ to an output function $\mathbf{u}(\mathbf{x})$, the network is probed within a desired range of input data to produce pairs of inputs/outputs. Subsequently, these pairs of data are used to learn an interpretable operator in the form of a linear sum of integral equations (NN-driven interpretable model). Alternatively, the interpretable mode is directly built based on given training data and without a neural network (data-driven interpretable model). The interpretable model is discovered by formulating a sparse regression problem using a library of pre-specified general functional linear models with different kernels.
  • Figure 2: A 1D example to motivate the challenge associated with generalization to out-of-distribution input is presented. True data (blue dashed line) and training data (red spheres) are shown. The training data does not cover the entire function. a) Neural network (NN) prediction. b) Gaussian process regression (GPR) prediction. The more interpretable GPR model improves prediction for mild extrapolation.
  • Figure 3: Test case 1 results are shown (predicting total strain energy from heterogeneous materials in the Mechanical MNIST and EMNIST datasets). a) An overview of the proposed machine learning task is shown where a single scalar value (strain energy) is predicted from a 2D image (stiffness). b) Boxplots of the absolute error (AE) distribution are shown. The performance of the neural network (NN), interpretable model trained on neural network predictions (Interp NN-driven), interpretable model trained on training data (Interp data-driven) are shown for the training data and out-of-distribution test data. The AE boxplot is showing the median (green line), lower/upper quartiles (blue box), the whiskers demonstrate the nonoutlier minimum/maximum of the data, and outliers are shown with red marks. Outliers are defined as values larger than 1.5 times the interquartile range.
  • Figure 4: Test case 2 results are shown (predicting maximum velocity from permeability fields in porous media flow). a) An overview of the proposed machine learning task is shown where a single scalar value (maximum velocity) is predicted from a 2D image (permeability). b) Boxplots of the absolute error (AE) distribution are shown. The performance of the neural network (NN), interpretable model trained on neural network predictions (Interp NN-driven), interpretable model trained on training data (Interp data-driven) are shown for the training data and out-of-distribution test data. Refer to Fig. \ref{['fig:t1']} for boxplot details.
  • Figure 5: Test case 3 results are shown (predicting velocity field from permeability fields in porous media flow). a) An overview of the proposed machine learning task is shown where a 2D velocity magnitude field is predicted from a 2D image (permeability). Neural network, interpretable model trained based on the neural network (NN-driven), and interpretable model trained based on training data (data-driven) results are compared to ground-truth for a sample input in the training regime. b) Boxplots of the point-wise absolute error (PAE) distribution considering point-wise error data aggregated across all samples and image-based absolute error considering the spatially averaged error of each output field variable are shown. The performance of the neural network (NN), interpretable model trained on neural network predictions (Interp NN-driven), interpretable model trained on training data (Interp data-driven) are shown for the training data and out-of-distribution test data. Refer to Fig. \ref{['fig:t1']} for boxplot details.
  • ...and 4 more figures