Operator Learning Using Weak Supervision from Walk-on-Spheres

Hrishikesh Viswanath; Hong Chul Nam; Xi Deng; Julius Berner; Anima Anandkumar; Aniket Bera

Operator Learning Using Weak Supervision from Walk-on-Spheres

Hrishikesh Viswanath, Hong Chul Nam, Xi Deng, Julius Berner, Anima Anandkumar, Aniket Bera

TL;DR

This work proposes to amortize the cost of Monte Carlo walks across the distribution of PDE instances using stochastic representations from the WoS algorithm to generate cheap, noisy, estimates of the PDE solution during training.

Abstract

Training neural PDE solvers is often bottlenecked by expensive data generation or unstable physics-informed neural network (PINN) involving challenging optimization landscapes due to higher-order derivatives. To tackle this issue, we propose an alternative approach using Monte Carlo approaches to estimate the solution to the PDE as a stochastic process for weak supervision during training. Leveraging the Walk-on-Spheres method, we introduce a learning scheme called \emph{Walk-on-Spheres Neural Operator (WoS-NO)} which uses weak supervision from WoS to train any given neural operator. We propose to amortize the cost of Monte Carlo walks across the distribution of PDE instances using stochastic representations from the WoS algorithm to generate cheap, noisy, estimates of the PDE solution during training. This is formulated into a data-free physics-informed objective where a neural operator is trained to regress against these weak supervisions, allowing the operator to learn a generalized solution map for an entire family of PDEs. This strategy does not require expensive pre-computed datasets, avoids computing higher-order derivatives for loss functions that are memory-intensive and unstable, and demonstrates zero-shot generalization to novel PDE parameters and domains. Experiments show that for the same number of training steps, our method exhibits up to 8.75$\times$ improvement in $L_2$-error compared to standard physics-informed training schemes, up to 6.31$\times$ improvement in training speed, and reductions of up to 2.97$\times$ in GPU memory consumption. We present the code at https://github.com/neuraloperator/WoS-NO

Operator Learning Using Weak Supervision from Walk-on-Spheres

TL;DR

Abstract

improvement in

-error compared to standard physics-informed training schemes, up to 6.31

improvement in training speed, and reductions of up to 2.97

in GPU memory consumption. We present the code at https://github.com/neuraloperator/WoS-NO

Paper Structure (41 sections, 32 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 41 sections, 32 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Problem scope
Background
Monte Carlo Methods for PDEs
Neural Operators
Problem Setting
Walk-on-Spheres from the Operator Perspective
Walk-on-Spheres as Weak Supervision
Scene Setup
Poisson Equations on parameterized domains
Second Order PDEs with spatially varying coefficients
Experiments
Comparisons with Baselines
Amortization
Architecture-Agnostic Loss
...and 26 more sections

Figures (7)

Figure 1: Given arbitrary, unseen input geometry and boundary conditions, we compare our method (WoS-NO) with WoS at equal execution time, and with DeepRitz and PINO at equal training time. We visualize the relative absolute error against the analytic solution of the PDE (the ground truth). We show that WoS-NO achieves the strongest performance (lowest relative error) in comparison with other baselines. During equal-time training, WoS-NO achieves 2.1$\times$ overall improvement than PINO and 1.59$\times$ than DeepRitz. During inference, WoS-NO achieves 3.73$\times$ better performance than WoS under the same time constraint.
Figure 2: Weak Supervision Loss with WoS: Our algorithm learns the given family of parametrized Poisson equations $\Delta u = f$ on $\Omega_T\subset \mathbb{R}^d$ and $u|_{\partial \Omega_T} = g$ and is agnostic to underlying neural operator architecture. The WoS method defines the recursive process of the random walk, stopping once the boundary or the maximum number of steps is reached. The source contribution $f(\xi_i)$ is computed for each intermediate point $\xi_i$ before jumping to the next point $\xi_{i+1}$. We achieve variance reduction by controlling the number of walks $L$ to improve the fidelity of the weak solution. $\hat{\mathcal{G}}_{1,\text{WoS}}[a](\xi)$ denotes the estimation of 1-trajectory WoS estimation. $\xi_K$ denotes the termination condition where the boundary value $g(\xi_K)$ is added if the point is within the tolerance region. The operator estimate is denoted by $\mathcal{G}_\theta[a](\xi)$. We illustrate the overall learning process, with WoS integral serving as the weak supervision for the neural operator.
Figure 3: Left: Training Poisson equations with spatially varying coefficients with equal time for 200 minutes, WoS-NO demonstrates the lowest $L_2$ error while converging the fastest. In contrast, PINO requires a much longer time to converge. Middle:After WoS-NO training is finished, we compare the amount of time needed to achieve the same level of accuracy as $L_2$-error to achieve the same $L_2$ error as a well-trained WoS-NO for 4096 pointwise estimations. Right: GINO, Transolver and GNOT are trained on DeepRitz, PINO and WoS-NO losses, and across all three operator architectures, WoS-NO is the strongest with the lowest $L_2$-error.
Figure 4: Left: This graph showcases the inference time of a single instance, as a function of domain size. In the case of FEM, it denotes the resolution of the mesh, while for the WoS-based methods, it's the number of query points Middle: This figure highlights the amortization achieved with our approach. Meshing time includes time needed to create meshes in the domain. Data preprocessing time includes the simulation time to create the regression target (FEM solution and WoS simulation). Training time represents the time required for the neural operator to learn the data-driven objective from the points sampled from the FEM solution. Our approach does not require meshing, and the total time for WoS-NO is lower than PINO while approximately equal to DeepRitz total time, making it an efficient operator for training. Right: This figure highlights how the data preprocessing time for FEM scales with the resolution of the inputs. Increasing the resolution results in a direct increase in computation time (hours). This is avoided by WoS-NO since WoS is highly parallelizable on GPU and does not depend on the input resolution.
Figure 5: Quantitative comparison for biharmonic inpainting. We evaluate our pre-trained operator against a traditional WoS solver and the scikit-image baseline. The plot shows the final error versus total runtime (log scale, in seconds) required to inpaint 20 masks. The average MSE compared to the ground truth for our method is $2.8e^{-3}\pm 7.2e^{-3}$, WoS ($5.7e^{-3}\pm 5.2e^{-3}$) and scikit-image ($5.4e^{-3}\pm 5.0e^{-3}$). The total wall clock time for WoS solver is 557.2s, Scikit-Image 6.8s and WoS-NO 2.8s.
...and 2 more figures

Operator Learning Using Weak Supervision from Walk-on-Spheres

TL;DR

Abstract

Operator Learning Using Weak Supervision from Walk-on-Spheres

Authors

TL;DR

Abstract

Table of Contents

Figures (7)