Domain decomposition of the modified Born series approach for large-scale wave propagation simulations
Swapnil Mache, Ivo M. Vellekoop
TL;DR
This work addresses the memory bottleneck of the Modified Born Series (MBS) for large-scale wave propagation by introducing a non-overlapping domain decomposition that distributes computations across multiple GPUs. The method preserves the MBS advantages—low memory usage, high accuracy, and monotonic convergence—while enabling larger problems through local subdomain convolutions and minimal inter-subdomain communication. The authors demonstrate substantial scalability, achieving a $3.27\cdot10^{7}$-wavelength 3D Helmholtz problem on two GPUs in 45 minutes, and show favorable performance up to four GPUs with manageable communication overhead. The approach is implemented in open-source Python, with strong potential extensions to Maxwell's equations and birefringent media, broadening applicability to optical and seismic wave simulations.
Abstract
The modified Born series (MBS) is a fast and accurate method for simulating wave propagation in complex structures. In the current implementation of the MBS, the simulation size is limited by the working memory of a single computer or graphics processing unit (GPU). Here, we present a domain decomposition method that enhances the scalability of the MBS by distributing the computations over multiple GPUs, while maintaining its accuracy, memory efficiency, and guaranteed monotonic convergence. With this new method, the computations can be performed in parallel, and a larger simulation size is possible as it is no longer limited to the memory size of a single computer or GPU. We show how to decompose large problems over subdomains and demonstrate our approach by solving the Helmholtz problem for a complex structure of $3.28\cdot 10^7$ cubic wavelengths ($320 \times 320 \times 320$ wavelengths) in just $45$ minutes with a dual-GPU simulation.
