Table of Contents
Fetching ...

Multigrid-Augmented Deep Learning Preconditioners for the Helmholtz Equation using Compact Implicit Layers

Bar Lerer, Ido Ben-Yair, Eran Treister

TL;DR

This work tackles solving the discrete heterogeneous Helmholtz equation at high wavenumbers by marrying geometric multigrid with a compact encoder–solver CNN preconditioner. A novel implicit layer on the coarsest grid, inspired by the Lippmann–Schwinger framework, enables the network to capture long-range interactions and overcome field-of-view limitations, while a multiscale training regime improves generalization to unseen problem sizes. The method reduces parameter count and computation, accelerates convergence when combined with FGMRES, and demonstrates strong scaling on 2D heterogeneous problems, including out-of-distribution slowness models. The approach offers practical impact for fast, reusable wave-solving in applications such as seismic imaging and inverse problems, with a clear path to three-dimensional extensions and custom GPU implementations.

Abstract

We present a deep learning-based iterative approach to solve the discrete heterogeneous Helmholtz equation for high wavenumbers. Combining classical iterative multigrid solvers and convolutional neural networks (CNNs) via preconditioning, we obtain a learned neural solver that is faster and scales better than a standard multigrid solver. Our approach offers three main contributions over previous neural methods of this kind. First, we construct a multilevel U-Net-like encoder-solver CNN with an implicit layer on the coarsest grid of the U-Net, where convolution kernels are inverted. This alleviates the field of view problem in CNNs and allows better scalability. Second, we improve upon the previous CNN preconditioner in terms of the number of parameters, computation time, and convergence rates. Third, we propose a multiscale training approach that enables the network to scale to problems of previously unseen dimensions while still maintaining a reasonable training procedure. Our encoder-solver architecture can be used to generalize over different slowness models of various difficulties and is efficient at solving for many right-hand sides per slowness model. We demonstrate the benefits of our novel architecture with numerical experiments on a variety of heterogeneous two-dimensional problems at high wavenumbers.

Multigrid-Augmented Deep Learning Preconditioners for the Helmholtz Equation using Compact Implicit Layers

TL;DR

This work tackles solving the discrete heterogeneous Helmholtz equation at high wavenumbers by marrying geometric multigrid with a compact encoder–solver CNN preconditioner. A novel implicit layer on the coarsest grid, inspired by the Lippmann–Schwinger framework, enables the network to capture long-range interactions and overcome field-of-view limitations, while a multiscale training regime improves generalization to unseen problem sizes. The method reduces parameter count and computation, accelerates convergence when combined with FGMRES, and demonstrates strong scaling on 2D heterogeneous problems, including out-of-distribution slowness models. The approach offers practical impact for fast, reusable wave-solving in applications such as seismic imaging and inverse problems, with a clear path to three-dimensional extensions and custom GPU implementations.

Abstract

We present a deep learning-based iterative approach to solve the discrete heterogeneous Helmholtz equation for high wavenumbers. Combining classical iterative multigrid solvers and convolutional neural networks (CNNs) via preconditioning, we obtain a learned neural solver that is faster and scales better than a standard multigrid solver. Our approach offers three main contributions over previous neural methods of this kind. First, we construct a multilevel U-Net-like encoder-solver CNN with an implicit layer on the coarsest grid of the U-Net, where convolution kernels are inverted. This alleviates the field of view problem in CNNs and allows better scalability. Second, we improve upon the previous CNN preconditioner in terms of the number of parameters, computation time, and convergence rates. Third, we propose a multiscale training approach that enables the network to scale to problems of previously unseen dimensions while still maintaining a reasonable training procedure. Our encoder-solver architecture can be used to generalize over different slowness models of various difficulties and is efficient at solving for many right-hand sides per slowness model. We demonstrate the benefits of our novel architecture with numerical experiments on a variety of heterogeneous two-dimensional problems at high wavenumbers.
Paper Structure (21 sections, 20 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 20 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Our implicit encoder-solver CNN architecture. The solver network (bottom) maps a residual vector ${\bf r}$ to an error ${\bf e}$. The encoder network (top) computes feature maps which are added to the solver architecture as indicated by the arrows in the diagram. Each convolutional block is followed by a batch normalization layer and the softplus activation function. At the coarsest level of the solver, we use the implicit step defined by \ref{['eq:Implicit', 'eq:Green']}. Feature maps are then upscaled progressively by learnable convolutional blocks and finally by a non-learnable bilinear upsampling filter (denoted BLU in the figure) back to the original size.
  • Figure 1: Out-of-distribution test. Top: velocity models used for each test aminzadeh1997models. Middle: the solution to a single-source Helmholtz problem computed by a single application of FGMRES with the implicit network, followed by a V-cycle, as preconditioner. Bottom: convergence plots of the implicit and explicit network preconditioners on each respective problem, as well as a V-cycle-only preconditioner.
  • Figure 2: Example slowness models $\kappa^2$ used for training and testing: (a) models from the CIFAR-10 dataset; (b) models from the OpenFWI Style-A dataset; (c) models from the STL-10 dataset. We generate separate datasets by sampling up to 16,000 images from each of the datasets.
  • Figure 2: MSE loss during training of explicit and implicit U-Net networks with multiscale training on the OpenFWI dataset. The MSE loss is as shown in \ref{['eq:lossnet']}. It is worth noting the slight increase in MSE value when the data is switched. Our results show that networks trained with multiscale training generalize better to larger unseen sizes.