High-performance matrix-free unfitted finite element operator evaluation
Maximilian Bergbauer, Peter Munch, Wolfgang A. Wall, Martin Kronbichler
TL;DR
The paper develops a matrix-free framework for high-order unfitted finite element operator evaluation on tensor-product hexahedral meshes embedded via level-set geometry, addressing the small cut cell problem with volume ghost penalties and unstructured quadrature through dimension-reduction techniques. By classifying quadrature into structured and unstructured terms and applying sum-factorization, the authors achieve significant throughput gains over traditional sparse-matrix approaches, with strong performance shown for $p=3$ DG and scalable large-scale simulations. Performance models and CPU benchmarks demonstrate promising roofline alignment and substantial speedups, validating the practicality of high-order unfitted methods in 3D. The work also discusses load balancing and preconditioning as critical factors for further improvements and scalability in real-world simulations.
Abstract
Unfitted finite element methods, like CutFEM, have traditionally been implemented in a matrix-based fashion, where a sparse matrix is assembled and later applied to vectors while solving the resulting linear system. With the goal of increasing performance and enabling algorithms with polynomial spaces of higher degrees, this contribution chooses a more abstract approach by matrix-free evaluation of the operator action on vectors instead. The proposed method loops over cells and locally evaluates the cell, face, and interface integrals, including the contributions from cut cells and the different means of stabilization. The main challenge is the efficient numerical evaluation of terms in the weak form with unstructured quadrature points arising from the unfitted discretization in cells cut by the interface. We present design choices and performance optimizations for tensor-product elements and demonstrate the performance by means of benchmarks and application examples. We demonstrate a speedup of more than one order of magnitude for the operator evaluation of a discontinuous Galerkin discretization with polynomial degree three compared to a sparse matrix-vector product and develop performance models to quantify the performance properties over a wide range of polynomial degrees.
