Table of Contents
Fetching ...

A shared compilation stack for distributed-memory parallelism in stencil DSLs

George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser

TL;DR

The paper presents a shared compilation stack for distributed-memory stencil computations by adapting MLIR/xDSL to HPC, introducing SSA-based dialects for stencil, domain decomposition (DMP), and MPI, and demonstrating cross-DSL sharing across Devito, PSyclone, and the Open Earth Compiler. By lowering domain-specific stencil representations through a unified IR stack to distributed and GPU-enabled code, the approach achieves competitive performance on CPUs, GPUs, and FPGAs while enabling reuse of HPC abstractions across DSL communities. Key contributions include the SSA dialect suite, prototype implementations for multiple stencil DSLs, and evaluative benchmarks showing scalable performance on large HPC systems. The work aims to establish a cohesive, interoperable HPC IR ecosystem that reduces maintenance costs and accelerates adoption of stencil DSLs at scale.

Abstract

Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target current- and next-generation supercomputers. The convenience and performance of DSLs come with significant development and maintenance costs. The siloed design of DSL compilers and the resulting inability to benefit from shared infrastructure cause uncertainties around longevity and the adoption of DSLs at scale. By tailoring the broadly-adopted MLIR compiler framework to HPC, we bring the same synergies that the machine learning community already exploits across their DSLs (e.g. Tensorflow, PyTorch) to the finite-difference stencil HPC community. We introduce new HPC-specific abstractions for message passing targeting distributed stencil computations. We demonstrate the sharing of common components across three distinct HPC stencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showing that our framework generates high-performance executables based upon a shared compiler ecosystem.

A shared compilation stack for distributed-memory parallelism in stencil DSLs

TL;DR

The paper presents a shared compilation stack for distributed-memory stencil computations by adapting MLIR/xDSL to HPC, introducing SSA-based dialects for stencil, domain decomposition (DMP), and MPI, and demonstrating cross-DSL sharing across Devito, PSyclone, and the Open Earth Compiler. By lowering domain-specific stencil representations through a unified IR stack to distributed and GPU-enabled code, the approach achieves competitive performance on CPUs, GPUs, and FPGAs while enabling reuse of HPC abstractions across DSL communities. Key contributions include the SSA dialect suite, prototype implementations for multiple stencil DSLs, and evaluative benchmarks showing scalable performance on large HPC systems. The work aims to establish a cohesive, interoperable HPC IR ecosystem that reduces maintenance costs and accelerates adoption of stencil DSLs at scale.

Abstract

Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target current- and next-generation supercomputers. The convenience and performance of DSLs come with significant development and maintenance costs. The siloed design of DSL compilers and the resulting inability to benefit from shared infrastructure cause uncertainties around longevity and the adoption of DSLs at scale. By tailoring the broadly-adopted MLIR compiler framework to HPC, we bring the same synergies that the machine learning community already exploits across their DSLs (e.g. Tensorflow, PyTorch) to the finite-difference stencil HPC community. We introduce new HPC-specific abstractions for message passing targeting distributed stencil computations. We demonstrate the sharing of common components across three distinct HPC stencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showing that our framework generates high-performance executables based upon a shared compiler ecosystem.
Paper Structure (18 sections, 16 figures, 1 table)

This paper contains 18 sections, 16 figures, 1 table.

Figures (16)

  • Figure 1: Our work enables reuse of HPC and target-specific abstractions across DSL and compiler frameworks and consequently offers synergies across DSL communities, while maintaining the community-tailored interfaces of each DSL compiler.
  • Figure 2: A 1D-3pt Jacobi stencil. Point updates depend on neighbouring values of the previous timestep
  • Figure 3: Example MLIR for 1-dimensional 3-point Jacobi stencil.
  • Figure 4: A high-level declarative expression of a data subsection exchange from some buffer.
  • Figure 5: The exchange declaration defines a rectangular region of size 100 by 4, starting at (4, 0) inside a memref that needs to be updated with data from the neighbor at the relative position (0, -1). In exchange for receiving this data, a rectangular region of the same size, but offset by (0, 4) will be sent to the neighbor. This allows us to effectively model halo exchanges in a declarative way.
  • ...and 11 more figures