Table of Contents
Fetching ...

Learning from Integral Losses in Physics Informed Neural Networks

Ehsan Saleh, Saba Ghaffari, Timothy Bretl, Luke Olson, Matthew West

TL;DR

This work addresses training physics-informed neural networks when residuals involve expensive integrals, showing that naive unbiased integral estimators induce bias in the optimization objective. It analyzes three strategies—deterministic sampling, the double-sampling trick, and a delayed-target bootstrapping method—and demonstrates that delayed targeting achieves accuracy comparable to large-sample estimators while using minimal sampling. The authors provide convergence guarantees and error bounds for the delayed target approach under linear function approximation, and validate the method on Poisson problems with singular charges, Maxwell equations, and Smoluchowski coagulation, with open-source code available. The results suggest that delayed targeting is a practically effective and scalable approach to learning from integral losses in scientific PINNs, though it requires careful hyperparameter tuning and may be complemented by adaptive sampling or quadrature techniques in future work.

Abstract

This work proposes a solution for the problem of training physics-informed networks under partial integro-differential equations. These equations require an infinite or a large number of neural evaluations to construct a single residual for training. As a result, accurate evaluation may be impractical, and we show that naive approximations at replacing these integrals with unbiased estimates lead to biased loss functions and solutions. To overcome this bias, we investigate three types of potential solutions: the deterministic sampling approaches, the double-sampling trick, and the delayed target method. We consider three classes of PDEs for benchmarking; one defining Poisson problems with singular charges and weak solutions of up to 10 dimensions, another involving weak solutions on electro-magnetic fields and a Maxwell equation, and a third one defining a Smoluchowski coagulation problem. Our numerical results confirm the existence of the aforementioned bias in practice and also show that our proposed delayed target approach can lead to accurate solutions with comparable quality to ones estimated with a large sample size integral. Our implementation is open-source and available at https://github.com/ehsansaleh/btspinn.

Learning from Integral Losses in Physics Informed Neural Networks

TL;DR

This work addresses training physics-informed neural networks when residuals involve expensive integrals, showing that naive unbiased integral estimators induce bias in the optimization objective. It analyzes three strategies—deterministic sampling, the double-sampling trick, and a delayed-target bootstrapping method—and demonstrates that delayed targeting achieves accuracy comparable to large-sample estimators while using minimal sampling. The authors provide convergence guarantees and error bounds for the delayed target approach under linear function approximation, and validate the method on Poisson problems with singular charges, Maxwell equations, and Smoluchowski coagulation, with open-source code available. The results suggest that delayed targeting is a practically effective and scalable approach to learning from integral losses in scientific PINNs, though it requires careful hyperparameter tuning and may be complemented by adaptive sampling or quadrature techniques in future work.

Abstract

This work proposes a solution for the problem of training physics-informed networks under partial integro-differential equations. These equations require an infinite or a large number of neural evaluations to construct a single residual for training. As a result, accurate evaluation may be impractical, and we show that naive approximations at replacing these integrals with unbiased estimates lead to biased loss functions and solutions. To overcome this bias, we investigate three types of potential solutions: the deterministic sampling approaches, the double-sampling trick, and the delayed target method. We consider three classes of PDEs for benchmarking; one defining Poisson problems with singular charges and weak solutions of up to 10 dimensions, another involving weak solutions on electro-magnetic fields and a Maxwell equation, and a third one defining a Smoluchowski coagulation problem. Our numerical results confirm the existence of the aforementioned bias in practice and also show that our proposed delayed target approach can lead to accurate solutions with comparable quality to ones estimated with a large sample size integral. Our implementation is open-source and available at https://github.com/ehsansaleh/btspinn.
Paper Structure (46 sections, 3 theorems, 71 equations, 21 figures, 5 tables, 2 algorithms)

This paper contains 46 sections, 3 theorems, 71 equations, 21 figures, 5 tables, 2 algorithms.

Key Result

Theorem 4.1

Following the assumptions and notation defined in Section sec:pinndefs of the supplementary material, notably (1) a linear function approximation $f_{\theta}(x)=\phi(x)^\mathrm{T} \theta$, (2) appropriate $\eta_t$ learning rates such that $\sum_{t=0}^{\infty} \eta_t = \infty$ and $\sum_{t=0}^{\infty This is in contrast to the standard training method, which solves for the fixed point of the $\math

Figures (21)

  • Figure 1: Training with the MSE loss under different sample sizes per surface ($N$). The heatmaps show the analytical solution (left), the low-variance training with $N=100$ (middle), and the high-variance training with $N=1$ (right). The smaller the $N$, the more biased the training objective becomes towards finding smoother solutions. The right panel shows the training curves; the training loss and the integration variance represent $\hat{\mathcal{L}}_{\theta}(x)$ and $\mathbb{V}_{P(x'|x)}[g_\theta(x')]$ in Equation \ref{['eq:excessvar']}, respectively. For $N=1$, the training loss seems to be floored at the same value as the integration variance (i.e., approximately $0.3$). However, with $N=100$, the model produces better solutions, lower training losses, and higher integration variances.
  • Figure 2: The results of the deterministic and double sampling techniques on the Poisson problem. The left plots demonstrate the solutions with $N=1$, while the right plots show the solutions with $N=100$. The training curves represent the mean squared error to the analytical solution vs. the training epochs. With $N=1$, the double sampling trick exhibits divergence in training, and the deterministic sampling process yields overly smooth functions similar to the standard solution in Figure \ref{['fig:msegt']}. However, with $N=100$, both the deterministic and double-sampling approaches exhibit improvements. According to the training curves, the delayed target method with $N=1$ yields the best solutions to this problem.
  • Figure 3: Training the same problem as in Figure \ref{['fig:msegt']} with delayed targets and $N=1$. The top left panel shows a diverged training with $M=100$ in Equation \ref{['eq:divthmbsloss']}. The lower left panel corresponds to $M=10$, which has a converging training curve even though it produces an overly smooth solution. In the lower right panel, we set $\lambda=1$ which allowed setting $M=1000$ while maintaining a stable training loss. In each panel, the left and right heatmaps show the main and the target model predictions, respectively, and the right plots show the training curves. The green curves show the training loss for the delayed target method, and the standard training curves with $N=1$ and $100$ are also shown using dotted red and blue lines for comparison, respectively. The top right panel shows an example of deterministic vs. i.i.d. sampling of the surface points in the Poisson problem. For each sampled sphere, the surface points and their normal vectors are shown with $N=100$ samples. With deterministic sampling, the points are evenly spaced to cover the sampling domain.
  • Figure 4: The solution and performance curves in higher-dimensional Poisson problems. The left panel shows the solution curves for the delayed target $(N=1)$, the standard $(N=100)$, and the double-sampling $(N=100)$ methods. The top and the bottom rows show 2- and 10-dimensional problems, respectively. In these problems, a single charge is located at the origin, so that the analytical solution is a function of the evaluation point radii $\|x\|$. The horizontal axis shows the evaluation point radii and covers 98% of points within the training volumes. The right chart is a performance curve against the problem dimension (lower is better). The normalized MSE values were shown to be comparable. These results suggest that (1) higher dimensions make the problem challenging, and (2) delayed targeting with $N=1$ is comparable to standard trainings with $N=100$. GQ and LQ refer to Gaussian and Leja quadrature, respectively, under a Smolyak sparse grid. Sections \ref{['sec:quadqmcsamplng']}, \ref{['sec:dtsampsizeabls']}, and \ref{['sec:hdpevalproto']} of the supplementary material describe the effect of sampling dimension on numerical quadrature and QMC, the effective way of scaling up $N$ for delayed targeting, and the performance evaluation profile, respectively.
  • Figure 5: The solution heatmaps and the training curves for different methods to the Maxwell problem. In the left panel, we show a single component of the magnetic potentials ($A_z$) in a 2D slice of the training space with $z=0$ for visual comparison. In the right plot, we show the training curves. The results suggest that (1) the standard and deterministic trainings with $N=1$ produce overly smooth solutions, and (2) delayed targeting with $N=1$ is comparable to standard trainings with $N=100$. Section \ref{['sec:trgtaulammaxwell']} of the supplementary material studies the target smoothing and regularization weights of the delayed target method in this problem.
  • ...and 16 more figures

Theorems & Definitions (7)

  • Example 2.1
  • Example 2.2
  • Example 2.3
  • Theorem 4.1
  • Theorem \ref{thm:delayedtargetlinsmry}
  • Corollary \ref{thm:delayedtargetlinsmry}
  • proof