Table of Contents
Fetching ...

Neural operators meet conjugate gradients: The FCG-NO method for efficient PDE solving

Alexander Rudikov, Vladimir Fanaskov, Ekaterina Muravleva, Yuri M. Laevsky, Ivan Oseledets

TL;DR

This work introduces FCG-NO, a hybrid approach that embeds a discretization-invariant neural operator as a nonlinear preconditioner for the flexible conjugate gradient method to solve elliptic PDEs. The method leverages a Krylov-subspace–based training scheme and an energy-norm Notay loss to train a Fourier-based spectral neural operator, enabling cross-resolution applicability from low- to high-resolution discretizations. Empirical results show that NO-based preconditioning outperforms classical preconditioners across grids and that training on Krylov residuals is essential for robust convergence, with Notay-loss yielding faster convergence than a conventional $L_2$ loss. The approach provides a principled way to combine consistency of traditional solvers with the efficiency of neural surrogates, achieving discretization-invariant performance and cross-resolution generalization while retaining convergence guarantees from FCG theory.

Abstract

Deep learning solvers for partial differential equations typically have limited accuracy. We propose to overcome this problem by using them as preconditioners. More specifically, we apply discretization-invariant neural operators to learn preconditioners for the flexible conjugate gradient method (FCG). Architecture paired with novel loss function and training scheme allows for learning efficient preconditioners that can be used across different resolutions. On the theoretical side, FCG theory allows us to safely use nonlinear preconditioners that can be applied in $O(N)$ operations without constraining the form of the preconditioners matrix. To justify learning scheme components (the loss function and the way training data is collected) we perform several ablation studies. Numerical results indicate that our approach favorably compares with classical preconditioners and allows to reuse of preconditioners learned for lower resolution to the higher resolution data.

Neural operators meet conjugate gradients: The FCG-NO method for efficient PDE solving

TL;DR

This work introduces FCG-NO, a hybrid approach that embeds a discretization-invariant neural operator as a nonlinear preconditioner for the flexible conjugate gradient method to solve elliptic PDEs. The method leverages a Krylov-subspace–based training scheme and an energy-norm Notay loss to train a Fourier-based spectral neural operator, enabling cross-resolution applicability from low- to high-resolution discretizations. Empirical results show that NO-based preconditioning outperforms classical preconditioners across grids and that training on Krylov residuals is essential for robust convergence, with Notay-loss yielding faster convergence than a conventional loss. The approach provides a principled way to combine consistency of traditional solvers with the efficiency of neural surrogates, achieving discretization-invariant performance and cross-resolution generalization while retaining convergence guarantees from FCG theory.

Abstract

Deep learning solvers for partial differential equations typically have limited accuracy. We propose to overcome this problem by using them as preconditioners. More specifically, we apply discretization-invariant neural operators to learn preconditioners for the flexible conjugate gradient method (FCG). Architecture paired with novel loss function and training scheme allows for learning efficient preconditioners that can be used across different resolutions. On the theoretical side, FCG theory allows us to safely use nonlinear preconditioners that can be applied in operations without constraining the form of the preconditioners matrix. To justify learning scheme components (the loss function and the way training data is collected) we perform several ablation studies. Numerical results indicate that our approach favorably compares with classical preconditioners and allows to reuse of preconditioners learned for lower resolution to the higher resolution data.
Paper Structure (21 sections, 1 theorem, 25 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 1 theorem, 25 equations, 8 figures, 6 tables, 1 algorithm.

Key Result

Theorem 2.1

Let $A,\,B \in \mathbb{R}^{n \times n}$ be symmetric positive definite matrices and $\mathcal{B}: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$. Let $f,~u_0$ be the vectors of $\mathbb{R}^n$, and let $\{r_i\}_{i=0,1,\ldots},\,\{p_i\}_{i=0,1,\ldots},\,\{u_i\}_{i=1,2,\ldots}$ be the sequences of vectors then where $\gamma_i = \dfrac{1 + \varepsilon_i}{1 - \varepsilon_i} \cdot \dfrac{\left(1 + \vareps

Figures (8)

  • Figure 1: Comparison of accuracy for three approaches: U-Net -- classical deep learning architecture, NO -- neural operator, NO+FCG -- hybrid approach advocated in the present article. Due to the prominent difference between accuracies, two distinct scales are used in the $y$ axis. One can observe that owing to the finite receptive field the performance of U-Net deteriorates with the increase of resolution. The neural operator provides the same accuracy with resolution increase -- this is a highly-praised "discretization invariance." For NO+FCG, the error decreases with the increase of resolution in the same way as for classical numerical methods.
  • Figure 2: The full scheme of the proposed approach: starts from the input train dataset, $\mathcal{D}_{\text{train}} = (A, f)$. (a) Submit $\mathcal{D}_{\text{train}}$ to the CG (FCG with $\mathcal{B} = I$). (b) Train the NO on the FCG output: $A, u_{\text{iter}}, r_{\text{iter}}$. (c) Apply the FCG with $\mathcal{B} = \text{NO}$ with the test dataset, $\mathcal{D}_{\text{test}}$. (d) Output $u_{\text{test}}$.
  • Figure 3: The behavior of $L_{\text{Notay}}$ and the decline of residuals by iteration for Poisson equation with $\text{grid}=32$ in cases of CG and NO+FCG.
  • Figure 4: The behavior of $L_{\text{Notay}}$ by iteration for Diffusion equation with $\text{grid}=32$ in cases of NO trained with two different losses ($L_{\text{Notay}}$ and $L_2$).
  • Figure 5: The decline of residuals by iteration for Diffusion equation with $\text{grid}=128$ in $L_2$, $r \sim p_{\mathcal{K}_{m}}(r)$ case.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Theorem 2.1