Table of Contents
Fetching ...

Interpretability and Generalization Bounds for Learning Spatial Physics

Alejandro Francisco Queiruga, Theo Gutman-Solo, Shuai Jiang

TL;DR

This work addresses the challenge of understanding when ML methods can reliably learn and generalize spatial physics, focusing on linear DEs and Green's-function representations. It develops a theory linking discretization, data function spaces, and learning dynamics, proving bounds on parameter learning and showing that linear operator learning converges to a projection of the true operator onto the training subspace via $W^* = A U U^T + W^0(I - U U^T)$. The paper validates these ideas through extensive experiments across finite-difference, PINN, DeepONet, Neural Operators, and physics-informed variants, demonstrating a robust subspace-generalization structure and introducing a cross-set validation protocol as a practical benchmark. It further shows that Green's functions can be extracted from well-generalizing black-box models, providing a mechanistic interpretable lens, while also revealing that different model classes can exhibit opposing generalization behaviors, thereby guiding data collection and evaluation in scientific ML.

Abstract

While there are many applications of ML to scientific problems that look promising, visuals can be deceiving. Using numerical analysis techniques, we rigorously quantify the accuracy, convergence rates, and generalization bounds of certain ML models applied to linear differential equations for parameter discovery or solution finding. Beyond the quantity and discretization of data, we identify that the function space of the data is critical to the generalization of the model. A similar lack of generalization is empirically demonstrated for commonly used models, including physics-specific techniques. Counterintuitively, we find that different classes of models can exhibit opposing generalization behaviors. Based on our theoretical analysis, we also introduce a new mechanistic interpretability lens on scientific models whereby Green's function representations can be extracted from the weights of black-box models. Our results inform a new cross-validation technique for measuring generalization in physical systems, which can serve as a benchmark.

Interpretability and Generalization Bounds for Learning Spatial Physics

TL;DR

This work addresses the challenge of understanding when ML methods can reliably learn and generalize spatial physics, focusing on linear DEs and Green's-function representations. It develops a theory linking discretization, data function spaces, and learning dynamics, proving bounds on parameter learning and showing that linear operator learning converges to a projection of the true operator onto the training subspace via . The paper validates these ideas through extensive experiments across finite-difference, PINN, DeepONet, Neural Operators, and physics-informed variants, demonstrating a robust subspace-generalization structure and introducing a cross-set validation protocol as a practical benchmark. It further shows that Green's functions can be extracted from well-generalizing black-box models, providing a mechanistic interpretable lens, while also revealing that different model classes can exhibit opposing generalization behaviors, thereby guiding data collection and evaluation in scientific ML.

Abstract

While there are many applications of ML to scientific problems that look promising, visuals can be deceiving. Using numerical analysis techniques, we rigorously quantify the accuracy, convergence rates, and generalization bounds of certain ML models applied to linear differential equations for parameter discovery or solution finding. Beyond the quantity and discretization of data, we identify that the function space of the data is critical to the generalization of the model. A similar lack of generalization is empirically demonstrated for commonly used models, including physics-specific techniques. Counterintuitively, we find that different classes of models can exhibit opposing generalization behaviors. Based on our theoretical analysis, we also introduce a new mechanistic interpretability lens on scientific models whereby Green's function representations can be extracted from the weights of black-box models. Our results inform a new cross-validation technique for measuring generalization in physical systems, which can serve as a benchmark.

Paper Structure

This paper contains 27 sections, 2 theorems, 43 equations, 13 figures.

Key Result

Theorem 3.1

Learning the parameter $k$ using a finite difference stencil of order $q$ given polynomial training data of degree $p$ on a grid of spacing $\Delta x$ results in $w=k$ when $p < q$, and an error when $p \ge q$, for constants $\mu_m$ depending on the truncation error coefficients of the finite difference stencil.

Figures (13)

  • Figure 1: Expectations when learning a black box linear model, ${\bm{u}}={\bm{W}}{\bm{f}}$. Despite some visual similarity to the Green's function operator (top), naive approaches are not guaranteed to converge to the true operator. The issue lies in the sampling procedure of the training data. The naive approach of using polynomial analytical solutions (illustrated as cubic polynomials) can achieve machine precision MSE loss, but even with infinitely many examples, will not converge to the general solution (center). Altering the construction of the training data, i.e. by using random piecewise linear functions (bottom) can recover the true operator. This allows for the extraction of a familiar discrete differential equation operator (right).
  • Figure 2: Comparison of numerical experiments to analytically derived ML parameters from Theorems \ref{['thm:fg']} and \ref{['thm:linear-case']} trained on polynomial spaces with $N_{grid}=22$. (Left) The observed error for the linear matrix matches predictions for lower order $p$. (Middle and right) For the finite difference model, the analytical assumptions overestimate the error, but the trend matches with observations.
  • Figure 3: Learning a parameter using a PINN inverse problem. Line is the mean of five runs, and the shaded region covers the min-max spread.
  • Figure 4: Cross-evaluation error heatmaps for four model types. Rows: training set; columns: test set. Dashed lines separate function class families; cells left of each cell are subspaces. Color shows log-scale MSE (sharp transitions $\approx \times10^{10}$). The blue lower triangular blocks in \ref{['fig:error:linear']} and \ref{['fig:error:deeplin']} indicate generalization with $\mathcal{L}(w,\mathcal{F}_{test})\lesssim \mathcal{L}(w,\mathcal{F}_{train})$ when $\mathcal{F}_{\text{test}}\subseteq \mathcal{F}_{\text{train}}$.
  • Figure 5: Generalization to out-of-distribution test datasets for the DeepONet and Fourier Neural Operator. The generalization patterns are similar to the Deep Linear models in Fig. \ref{['fig:cross_error']}.
  • ...and 8 more figures

Theorems & Definitions (4)

  • Theorem 3.1
  • Theorem 3.2
  • proof
  • proof