Multilevel Interior Penalty Methods on GPUs
Cu Cui, Guido Kanschat
TL;DR
This work addresses efficiently solving high-order discontinuous Galerkin discretizations of the Poisson problem using matrix-free geometric multigrid on GPUs. It introduces a vertex-patch smoother and fast diagonalization for local inverses, along with a patch-wise integration strategy that preserves tensor-product structure and reduces arithmetic and memory access. The GPU implementation emphasizes data-layout, memory coalescing, bank-conflict avoidance, and mixed-precision computation, plus MPI parallelization across multiple GPUs. Results show up to 39% of the Nvidia ${A100}$ peak flop rate and substantial speedups (up to 90%) from mixed precision, with strong and weak scalability demonstrated across multi-GPU configurations. The approach offers practical guidelines for robust, scalable high-order DG solvers on modern GPU architectures.
Abstract
We present a matrix-free multigrid method for high-order discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 39% of the peak performance on Nvidia A100 GPUs are achieved. Experimental results affirm the effectiveness of mixed-precision approaches and MPI parallelization in accelerating algorithms. Furthermore, an assessment of solver efficiency and robustness is provided across both two and three dimensions, with applications to Poisson problems.
