Mathematical artificial data for operator learning
Heng Wu, Benzhuo Lu
TL;DR
The paper addresses the data- and efficiency-challenges of learning operators for PDEs by introducing Mathematical Artificial Data (MAD), a physics-embedded data-driven framework that generates analytic training data from the governing equations. MAD decomposes the solution operator as $\varphi(f,g)=\varphi_1(g)+\varphi_2(f)$ and proposes three data-generation strategies (MAD0, MAD1, MAD2) that leverage neural surrogates, fundamental solutions, and harmonic bases to cover large parameter spaces. Across Poisson, Laplace, and Helmholtz equations in 2D and 3D, MAD demonstrates superior accuracy and training efficiency relative to PINNs, while remaining compatible with neural operators such as DeepONet and FNO and exhibiting strong geometric adaptability. The framework offers a scalable, noise-free data-generation pathway that can augment or replace conventional data-driven training, with potential to become a universal paradigm for physics-informed machine intelligence in scientific computing.
Abstract
Machine learning has emerged as a transformative tool for solving differential equations (DEs), yet prevailing methodologies remain constrained by dual limitations: data-driven methods demand costly labeled datasets while model-driven techniques face efficiency-accuracy trade-offs. We present the Mathematical Artificial Data (MAD) framework, a new paradigm that integrates physical laws with data-driven learning to facilitate large-scale operator discovery. By exploiting DEs' intrinsic mathematical structure to generate physics-embedded analytical solutions and associated synthetic data, MAD fundamentally eliminates dependence on experimental or simulated training data. This enables computationally efficient operator learning across multi-parameter systems while maintaining mathematical rigor. Through numerical demonstrations spanning 2D parametric problems where both the boundary values and source term are functions, we showcase MAD's generalizability and superior efficiency/accuracy across various DE scenarios. This physics-embedded-data-driven framework and its capacity to handle complex parameter spaces gives it the potential to become a universal paradigm for physics-informed machine intelligence in scientific computing.
