A shared compilation stack for distributed-memory parallelism in stencil DSLs
George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser
TL;DR
The paper presents a shared compilation stack for distributed-memory stencil computations by adapting MLIR/xDSL to HPC, introducing SSA-based dialects for stencil, domain decomposition (DMP), and MPI, and demonstrating cross-DSL sharing across Devito, PSyclone, and the Open Earth Compiler. By lowering domain-specific stencil representations through a unified IR stack to distributed and GPU-enabled code, the approach achieves competitive performance on CPUs, GPUs, and FPGAs while enabling reuse of HPC abstractions across DSL communities. Key contributions include the SSA dialect suite, prototype implementations for multiple stencil DSLs, and evaluative benchmarks showing scalable performance on large HPC systems. The work aims to establish a cohesive, interoperable HPC IR ecosystem that reduces maintenance costs and accelerates adoption of stencil DSLs at scale.
Abstract
Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target current- and next-generation supercomputers. The convenience and performance of DSLs come with significant development and maintenance costs. The siloed design of DSL compilers and the resulting inability to benefit from shared infrastructure cause uncertainties around longevity and the adoption of DSLs at scale. By tailoring the broadly-adopted MLIR compiler framework to HPC, we bring the same synergies that the machine learning community already exploits across their DSLs (e.g. Tensorflow, PyTorch) to the finite-difference stencil HPC community. We introduce new HPC-specific abstractions for message passing targeting distributed stencil computations. We demonstrate the sharing of common components across three distinct HPC stencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showing that our framework generates high-performance executables based upon a shared compiler ecosystem.
