Stochastic Quadrature Rules for Solving PDEs using Neural Networks
Jamie M. Taylor, David Pardo
TL;DR
The paper addresses numerical integration challenges in neural PDE solvers, focusing on the DRM for Poisson-type problems. It shows that deterministic and biased stochastic quadrature can mislead solutions, and demonstrates that unbiased, high-order stochastic quadrature on low-dimensional integration meshes yields substantially faster convergence at comparable cost. It introduces new unbiased quadrature rules on triangular and tetrahedral meshes, enabling flexible meshing in complex geometries, and shows that convergence is ultimately limited by the variance of the stochastic gradient rather than the loss itself. The work highlights that accurate gradient integration is crucial for gradient-based optimisation and provides practical guidance and tools for variance reduction in low-dimensional neural PDE solvers.
Abstract
We examine the challenges associated with numerical integration when applying Neural Networks to solve Partial Differential Equations (PDEs). We specifically investigate the Deep Ritz Method (DRM), chosen for its practical applicability and known sensitivity to integration inaccuracies. Our research demonstrates that both standard deterministic integration techniques and biased stochastic quadrature methods can lead to incorrect solutions. In contrast, employing high-order, unbiased stochastic quadrature rules defined on integration meshes in low dimensions is shown to significantly enhance convergence rates at a comparable computational expense with respect to low-order methods like Monte Carlo. Additionally, we introduce novel stochastic quadrature approaches designed for triangular and tetrahedral mesh elements, offering increased adaptability for handling complex geometric domains. We highlight that the variance inherent in the stochastic gradient acts as a bottleneck for convergence. Furthermore, we observe that for gradient-based optimisation, the crucial factor is the accurate integration of the gradient, rather than just minimizing the quadrature error of the loss function itself.
