Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations

Tengfei Xu; Dachuan Liu; Peng Hao; Bo Wang

Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations

Tengfei Xu, Dachuan Liu, Peng Hao, Bo Wang

TL;DR

Variational operator learning (VOL) presents a unified framework that fuses neural operator training with solving PDEs through the variational (weak) form. By employing Ritz and Galerkin approaches with finite element discretization in a matrix-free fashion, VOL minimizes residuals rather than direct functionals, enabling label-free training with a small labeled shift set. The method demonstrates data-efficiency, resolution-robustness, and generalization benefits across variable heat sources, Darcy flow, and variable stiffness elasticity, while offering two optimization strategies (direct minimization and iterative updates). This approach paves the way for physics-informed neural operators that leverage classical variational principles to reduce data requirements and improve solver integration in scientific computing.

Abstract

Neural operators as novel neural architectures for fast approximating solution operators of partial differential equations (PDEs), have shown considerable promise for future scientific computing. However, the mainstream of training neural operators is still data-driven, which needs an expensive ground-truth dataset from various sources (e.g., solving PDEs' samples with the conventional solvers, real-world experiments) in addition to training stage costs. From a computational perspective, marrying operator learning and specific domain knowledge to solve PDEs is an essential step in reducing dataset costs and label-free learning. We propose a novel paradigm that provides a unified framework of training neural operators and solving PDEs with the variational form, which we refer to as the variational operator learning (VOL). Ritz and Galerkin approach with finite element discretization are developed for VOL to achieve matrix-free approximation of system functional and residual, then direct minimization and iterative update are proposed as two optimization strategies for VOL. Various types of experiments based on reasonable benchmarks about variable heat source, Darcy flow, and variable stiffness elasticity are conducted to demonstrate the effectiveness of VOL. With a label-free training set and a 5-label-only shift set, VOL learns solution operators with its test errors decreasing in a power law with respect to the amount of unlabeled data. To the best of the authors' knowledge, this is the first study that integrates the perspectives of the weak form and efficient iterative methods for solving sparse linear systems into the end-to-end operator learning task.

Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations

TL;DR

Abstract

Paper Structure (26 sections, 26 equations, 11 figures, 1 table, 3 algorithms)

This paper contains 26 sections, 26 equations, 11 figures, 1 table, 3 algorithms.

Introduction
Our contributions.
Related work
Surrogate modeling.
Domain knowledge embedding and deep model embedding.
Operator learning and neural operators.
Preliminaries
Forms of partial differential equations
Ritz method and Galerkin method
Finite element methods
Results
Problem settings
Variable stiffness elasticity
Steady heat transfer with variable heat source and Darcy flow
Scaling experiments
...and 11 more sections

Figures (11)

Figure 1: A schematic representation of the VOL benchmarks in this work.
Figure 1: A linear elastic body with body forces and boundary conditions.
Figure 2: Results of scaling experiments.a, The test errors for all cases in the scaling experiments show a polynomial convergence rate. The error bars show the one standard deviation from 5 runs with different training/test data initialization. $x$ is the number of training data. b, The worst test errors also decrease in all cases as the size of training data increases.
Figure 2: The relationship between the computational mesh, the residual tensor and the linear system. Suppose that every node of the computational mesh has three degrees of freedom. The element $r_{3m+n-3}$ of the residual tensor is just the residual of the $\left( 3m+n-3\right)$th equation of the corresponding linear system.
Figure 3: Comparison on VOL and classical iterative methods.a, b, c, Average relative $L^2$ errors on the training set are recorded for every epoch in the first experiment. a, b, c are the training errors at resolution 33$\times$33, 129$\times$129, 257$\times$257 respectively. The shaded regions denote one standard deviation. d, We plot $y=0.02$ as a reference line VOL+CG($i$). As the number of update steps increases, the test error of VOL+CG($i$) decreases slowly, from coinciding with $y=0.02$ to gradually falling below it.
...and 6 more figures

Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations

TL;DR

Abstract

Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations

Authors

TL;DR

Abstract

Table of Contents

Figures (11)