Table of Contents
Fetching ...

An MLIR pipeline for offloading Fortran to FPGAs via OpenMP

Gabriel Rodriguez-Canal, David Katz, Nick Brown

TL;DR

Facing the HPC push for energy-efficient acceleration, this paper delivers the first MLIR-based pipeline that offloads Fortran OpenMP regions to FPGAs via the OpenMP target directive. The pipeline uses the MLIR OpenMP dialect together with an HLS dialect, with Flang generating LLVM-IR and leveraging AMD's HLS backend to produce FPGA bitstreams, showcasing portability across MLIR frontends. The approach benefits from MLIR's composability to reduce bespoke compiler effort while preserving OpenMP data semantics and enabling manual kernel optimizations, e.g., $y = a \times x + y$ for SAXPY and the solution of $A x = b$ for SGESL. Evaluation on SAXPY and SGESL indicates runtime parity with hand-written HLS and comparable resource usage, while FPGA power remains about half that of a CPU core, highlighting productivity gains without sacrificing performance. Overall, the work demonstrates MLIR's viability as a practical, directive-based pathway to FPGA acceleration in HPC.

Abstract

With the slowing of Moore's Law, heterogeneous computing platforms such as Field Programmable Gate Arrays (FPGAs) have gained increasing interest for accelerating HPC workloads. In this work we present, to the best of our knowledge, the first implementation of selective code offloading to FPGAs via the OpenMP target directive within MLIR. Our approach combines the MLIR OpenMP dialect with a High-Level Synthesis (HLS) dialect to provide a portable compilation flow targeting FPGAs. Unlike prior OpenMP FPGA efforts that rely on custom compilers, by contrast we integrate with MLIR and so support any MLIR-compatible front end, demonstrated here with Flang. Building upon a range of existing MLIR building blocks significantly reduces the effort required and demonstrates the composability benefits of the MLIR ecosystem. Our approach supports manual optimisation of offloaded kernels through standard OpenMP directives, and this work establishes a flexible and extensible path for directive-based FPGA acceleration integrated within the MLIR ecosystem.

An MLIR pipeline for offloading Fortran to FPGAs via OpenMP

TL;DR

Facing the HPC push for energy-efficient acceleration, this paper delivers the first MLIR-based pipeline that offloads Fortran OpenMP regions to FPGAs via the OpenMP target directive. The pipeline uses the MLIR OpenMP dialect together with an HLS dialect, with Flang generating LLVM-IR and leveraging AMD's HLS backend to produce FPGA bitstreams, showcasing portability across MLIR frontends. The approach benefits from MLIR's composability to reduce bespoke compiler effort while preserving OpenMP data semantics and enabling manual kernel optimizations, e.g., for SAXPY and the solution of for SGESL. Evaluation on SAXPY and SGESL indicates runtime parity with hand-written HLS and comparable resource usage, while FPGA power remains about half that of a CPU core, highlighting productivity gains without sacrificing performance. Overall, the work demonstrates MLIR's viability as a practical, directive-based pathway to FPGA acceleration in HPC.

Abstract

With the slowing of Moore's Law, heterogeneous computing platforms such as Field Programmable Gate Arrays (FPGAs) have gained increasing interest for accelerating HPC workloads. In this work we present, to the best of our knowledge, the first implementation of selective code offloading to FPGAs via the OpenMP target directive within MLIR. Our approach combines the MLIR OpenMP dialect with a High-Level Synthesis (HLS) dialect to provide a portable compilation flow targeting FPGAs. Unlike prior OpenMP FPGA efforts that rely on custom compilers, by contrast we integrate with MLIR and so support any MLIR-compatible front end, demonstrated here with Flang. Building upon a range of existing MLIR building blocks significantly reduces the effort required and demonstrates the composability benefits of the MLIR ecosystem. Our approach supports manual optimisation of offloaded kernels through standard OpenMP directives, and this work establishes a flexible and extensible path for directive-based FPGA acceleration integrated within the MLIR ecosystem.

Paper Structure

This paper contains 7 sections, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Illustration of flow developed in brown2024fully to lower Flang to core dialects and then generate LLVM-IR from brown2024fully.
  • Figure 2: Illustration of our compilation flow from Fortran with OpenMP to the host code and FPGA bitstream