Multigrid-Augmented Deep Learning Preconditioners for the Helmholtz Equation using Compact Implicit Layers
Bar Lerer, Ido Ben-Yair, Eran Treister
TL;DR
This work tackles solving the discrete heterogeneous Helmholtz equation at high wavenumbers by marrying geometric multigrid with a compact encoder–solver CNN preconditioner. A novel implicit layer on the coarsest grid, inspired by the Lippmann–Schwinger framework, enables the network to capture long-range interactions and overcome field-of-view limitations, while a multiscale training regime improves generalization to unseen problem sizes. The method reduces parameter count and computation, accelerates convergence when combined with FGMRES, and demonstrates strong scaling on 2D heterogeneous problems, including out-of-distribution slowness models. The approach offers practical impact for fast, reusable wave-solving in applications such as seismic imaging and inverse problems, with a clear path to three-dimensional extensions and custom GPU implementations.
Abstract
We present a deep learning-based iterative approach to solve the discrete heterogeneous Helmholtz equation for high wavenumbers. Combining classical iterative multigrid solvers and convolutional neural networks (CNNs) via preconditioning, we obtain a learned neural solver that is faster and scales better than a standard multigrid solver. Our approach offers three main contributions over previous neural methods of this kind. First, we construct a multilevel U-Net-like encoder-solver CNN with an implicit layer on the coarsest grid of the U-Net, where convolution kernels are inverted. This alleviates the field of view problem in CNNs and allows better scalability. Second, we improve upon the previous CNN preconditioner in terms of the number of parameters, computation time, and convergence rates. Third, we propose a multiscale training approach that enables the network to scale to problems of previously unseen dimensions while still maintaining a reasonable training procedure. Our encoder-solver architecture can be used to generalize over different slowness models of various difficulties and is efficient at solving for many right-hand sides per slowness model. We demonstrate the benefits of our novel architecture with numerical experiments on a variety of heterogeneous two-dimensional problems at high wavenumbers.
