Residual-based attention and connection to information bottleneck theory in PINNs

Sokratis J. Anagnostopoulos; Juan Diego Toscano; Nikolaos Stergiopulos; George Em Karniadakis

Residual-based attention and connection to information bottleneck theory in PINNs

Sokratis J. Anagnostopoulos, Juan Diego Toscano, Nikolaos Stergiopulos, George Em Karniadakis

TL;DR

This work addresses convergence and accuracy challenges in physics-informed neural networks by introducing a gradient-free residual-based attention (RBA) scheme that adaptively weights collocation points according to evolving residuals. Combined with boundary-condition exactness (ADF and Fourier embeddings) and a modified MLP (mMLP), the method achieves state-of-the-art-like accuracy on dynamic and static PDE benchmarks, including the 1D Allen-Cahn and 2D Helmholtz equations, with relative $L^2$ errors in the $10^{-5}$ range. A key contribution is the observed two-phase learning behavior—fitting followed by diffusion—that aligns with information bottleneck theory, supported by gradient-based SNR analyses. The results offer practical insights for reliable PINN training and suggest a path toward understanding neural operators through the IB lens, with broad applicability to complex, multi-physics problems.

Abstract

Driven by the need for more efficient and seamless integration of physical models and data, physics-informed neural networks (PINNs) have seen a surge of interest in recent years. However, ensuring the reliability of their convergence and accuracy remains a challenge. In this work, we propose an efficient, gradient-less weighting scheme for PINNs, that accelerates the convergence of dynamic or static systems. This simple yet effective attention mechanism is a function of the evolving cumulative residuals and aims to make the optimizer aware of problematic regions at no extra computational cost or adversarial learning. We illustrate that this general method consistently achieves a relative $L^{2}$ error of the order of $10^{-5}$ using standard optimizers on typical benchmark cases of the literature. Furthermore, by investigating the evolution of weights during training, we identify two distinct learning phases reminiscent of the fitting and diffusion phases proposed by the information bottleneck (IB) theory. Subsequent gradient analysis supports this hypothesis by aligning the transition from high to low signal-to-noise ratio (SNR) with the transition from fitting to diffusion regimes of the adopted weights. This novel correlation between PINNs and IB theory could open future possibilities for understanding the underlying mechanisms behind the training and stability of PINNs and, more broadly, of neural operators.

Residual-based attention and connection to information bottleneck theory in PINNs

TL;DR

errors in the

range. A key contribution is the observed two-phase learning behavior—fitting followed by diffusion—that aligns with information bottleneck theory, supported by gradient-based SNR analyses. The results offer practical insights for reliable PINN training and suggest a path toward understanding neural operators through the IB lens, with broad applicability to complex, multi-physics problems.

Abstract

error of the order of

using standard optimizers on typical benchmark cases of the literature. Furthermore, by investigating the evolution of weights during training, we identify two distinct learning phases reminiscent of the fitting and diffusion phases proposed by the information bottleneck (IB) theory. Subsequent gradient analysis supports this hypothesis by aligning the transition from high to low signal-to-noise ratio (SNR) with the transition from fitting to diffusion regimes of the adopted weights. This novel correlation between PINNs and IB theory could open future possibilities for understanding the underlying mechanisms behind the training and stability of PINNs and, more broadly, of neural operators.

Paper Structure (24 sections, 27 equations, 9 figures, 9 tables, 1 algorithm)

This paper contains 24 sections, 27 equations, 9 figures, 9 tables, 1 algorithm.

Introduction
Methods
Physics-Informed Neural networks
Residual-based attention (RBA) scheme
Additional PINN enhancements
Modified multi-layer perceptrons
Exact imposition of boundary conditions
Dirichlet Boundary Conditions
Periodic Boundary Conditions
Results
Dynamic case: 1D Allen-Cahn equation
Ablation Study for Allen-Cahn
Static case: 2D Helmholtz equation
Ablation Study for Helmholtz
RBA weight evolution
...and 9 more sections

Figures (9)

Figure 1: Exact solution of the 1D Allen-Cahn with the corresponding network prediction and the absolute error difference.
Figure 2: Ablation study convergence for the 1D Allen-Cahn: Progression of convergence for each experiment. The results clearly demonstrate that the integration of the RBA approach and the Fourier feature embedding is crucial for attaining a minimal relative $L^2$. When combined with a modified MLP architecture, an optimal $L^2$ of $4.5\cdot 10^{-5}$ is reached. The implemented mMLP aids in speeding up the convergence during the initial 60000 iterations. Additionally, the RBA weights address the instability issues in the standard PINN, achieving a relative $L^2$ of $3.16\cdot 10^{-3}$ without supplementary components. The noise linked with the top-performing methods suggests that the optimization process successfully avoids becoming trapped in sub-optimal solutions by effectively "leaping" past local minima in the loss landscape.
Figure 3: Analytical solution for the 2D Helmholtz equation and the corresponding network prediction with the absolute error difference.
Figure 4: Ablation study convergence for the 2D Helmholtz: The convergence trajectory for each experiment is illustrated here. For this case, the Fourier feature embedding is critical in achieving a low relative $L^2$. Coupled with the RBA weights and a modified MLP architecture, the model reaches an optimal $L^2$ of $8.91\cdot 10^{-6}$ after $2\cdot10^4$ Adam and $3\cdot10^4$ L-BFGS training steps.
Figure 5: Evolution of RBA weights. For each case, the peak value is limited as per Eq. \ref{['eq:bound']}, with this study's upper bound equaling $10$. Notice that for the Helmholtz (Fourier) case, the RBA weights were only updated during Adam, indicating that the model did not fully converge. However, for the ADF the weights are updated during L-BFGS which pushes the maximum value to the upper bound after 20000 iterations. In general, the maxima fluctuate until they approach the upper limit during the final learning stages. On the other hand, the mean values show a less noticeable fluctuation before eventually stabilizing at about $\approx 20\%$ of the upper bound. This trend indicates that, on average, the total magnitude of weights remains constant while their distribution adjusts dynamically as the optimizer shifts its focus across different domain sections.
...and 4 more figures

Residual-based attention and connection to information bottleneck theory in PINNs

TL;DR

Abstract

Residual-based attention and connection to information bottleneck theory in PINNs

Authors

TL;DR

Abstract

Table of Contents

Figures (9)