Neural Preconditioning Operator for Efficient PDE Solves
Zhihao Li, Di Xiao, Zhilu Lai, Wei Wang
TL;DR
The paper addresses the slow convergence of Krylov solvers for large PDE-derived linear systems by introducing a Neural Preconditioning Operator (NPO) that learns data-driven preconditioners via condition and residual losses. It combines a Neural Algebraic Multigrid (NAMG) module with transformer-based attention to enable robust, multiscale error correction, yielding spectral clustering of MA near 1 and a convergence rate largely independent of problem size. Empirical results across Poisson, Diffusion, and Linear Elasticity problems show NPO substantially reducing iteration counts and runtimes compared with classical and existing neural preconditioners, and it generalizes across meshes and resolutions up to 4096. The work also provides theoretical convergence guarantees inspired by classical multigrid and discusses practical implications and future directions, such as adaptive parameterization and scalable parallel implementations.
Abstract
We introduce the Neural Preconditioning Operator (NPO), a novel approach designed to accelerate Krylov solvers in solving large, sparse linear systems derived from partial differential equations (PDEs). Unlike classical preconditioners that often require extensive tuning and struggle to generalize across different meshes or parameters, NPO employs neural operators trained via condition and residual losses. This framework seamlessly integrates with existing neural network models, serving effectively as a preconditioner to enhance the performance of Krylov subspace methods. Further, by melding algebraic multigrid principles with a transformer-based architecture, NPO significantly reduces iteration counts and runtime for solving Poisson, Diffusion, and Linear Elasticity problems on both uniform and irregular meshes. Our extensive numerical experiments demonstrate that NPO outperforms traditional methods and contemporary neural approaches across various resolutions, ensuring robust convergence even on grids as large as 4096, far exceeding its initial training limits. These findings underscore the potential of data-driven preconditioning to transform the computational efficiency of high-dimensional PDE applications.
