A High-level Synthesis Toolchain for the Julia Language
Benedict Short, Ian McInerney, John Wickerson
TL;DR
<3-5 sentence high-level summary> The paper addresses the persistent two-language problem in hardware acceleration by presenting JuliaHLS, an open-source MLIR-based toolchain that compiles Julia kernels directly into SystemVerilog via CIRCT. It preserves high-level program structure through MLIR dialects, enables both dynamic and static scheduling, and interfaces with AXI4-Stream for memory subsystems, producing vendor-agnostic RTL that can run at 100 MHz. The authors demonstrate the approach with Cordic and conv2d_im2col benchmarks, achieving up to 82.6% throughput of state-of-the-art toolchains that operate from low-level languages, while enabling more expressive Julia-based development. They also show significant memory and control-flow optimisations during compilation, highlighting a practical path toward interactive accelerator co-development in Julia and broader FPGA accessibility for domain scientists. Future work targets deeper Julia integration, more MLIR/CIRCT dialects, and enhanced co-design verification and runtime support.
Abstract
With the push towards Exascale computing and data-driven methods, problem sizes have increased dramatically, increasing the computational requirements of the underlying algorithms. This has led to a push to offload computations to general purpose hardware accelerators such as GPUs and TPUs, and a renewed interest in designing problem-specific accelerators using FPGAs. However, the development process of these problem-specific accelerators currently suffers from the "two-language problem": algorithms are developed in one (usually higher-level) language, but the kernels are implemented in another language at a completely different level of abstraction and requiring fundamentally different expertise. To address this problem, we propose a new MLIR-based compiler toolchain that unifies the development process by automatically compiling kernels written in the Julia programming language into SystemVerilog without the need for any additional directives or language customisations. Our toolchain supports both dynamic and static scheduling, directly integrates with the AXI4-Stream protocol to interface with subsystems like on- and off-chip memory, and generates vendor-agnostic RTL. This prototype toolchain is able to synthesize a set of signal processing/mathematical benchmarks that can operate at 100MHz on real FPGA devices, achieving between 59.71% and 82.6% of the throughput of designs generated by state-of-the-art toolchains that only compile from low-level languages like C or C++. Overall, this toolchain allows domain experts to write compute kernels in Julia as they normally would, and then retarget them to an FPGA without additional pragmas or modifications.
