hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware

Jan-Frederik Schulte; Benjamin Ramhorst; Chang Sun; Jovan Mitrevski; Nicolò Ghielmetti; Enrico Lupi; Dimitrios Danopoulos; Vladimir Loncar; Javier Duarte; David Burnette; Lauri Laatu; Stylianos Tzelepis; Konstantinos Axiotis; Quentin Berthet; Haoyan Wang; Paul White; Suleyman Demirsoy; Marco Colombo; Thea Aarrestad; Sioni Summers; Maurizio Pierini; Giuseppe Di Guglielmo; Jennifer Ngadiuba; Javier Campos; Ben Hawks; Abhijith Gandrakota; Farah Fahim; Nhan Tran; George Constantinides; Zhiqiang Que; Wayne Luk; Alexander Tapper; Duc Hoang; Noah Paladino; Philip Harris; Bo-Cheng Lai; Manuel Valentin; Ryan Forelli; Seda Ogrenci; Lino Gerlach; Rian Flynn; Mia Liu; Daniel Diaz; Elham Khoda; Melissa Quinnan; Russell Solares; Santosh Parajuli; Mark Neubauer; Christian Herwig; Ho Fung Tsoi; Dylan Rankin; Shih-Chieh Hsu; Scott Hauck

hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware

Jan-Frederik Schulte, Benjamin Ramhorst, Chang Sun, Jovan Mitrevski, Nicolò Ghielmetti, Enrico Lupi, Dimitrios Danopoulos, Vladimir Loncar, Javier Duarte, David Burnette, Lauri Laatu, Stylianos Tzelepis, Konstantinos Axiotis, Quentin Berthet, Haoyan Wang, Paul White, Suleyman Demirsoy, Marco Colombo, Thea Aarrestad, Sioni Summers, Maurizio Pierini, Giuseppe Di Guglielmo, Jennifer Ngadiuba, Javier Campos, Ben Hawks, Abhijith Gandrakota, Farah Fahim, Nhan Tran, George Constantinides, Zhiqiang Que, Wayne Luk, Alexander Tapper, Duc Hoang, Noah Paladino, Philip Harris, Bo-Cheng Lai, Manuel Valentin, Ryan Forelli, Seda Ogrenci, Lino Gerlach, Rian Flynn, Mia Liu, Daniel Diaz, Elham Khoda, Melissa Quinnan, Russell Solares, Santosh Parajuli, Mark Neubauer, Christian Herwig, Ho Fung Tsoi, Dylan Rankin, Shih-Chieh Hsu, Scott Hauck

TL;DR

hls4ml addresses the gap between modern DL frameworks and FPGA/ASIC deployment by translating trained models into HLS-compatible code. Its compiler-inspired workflow combines modular front ends (Keras, PyTorch, ONNX), a unifying IR, optimizer passes, and diverse back ends (Vitis, oneAPI, Catapult) to deliver low-latency, resource-aware hardware designs. The framework supports quantization-aware techniques (QKeras, HGQ), distributed arithmetic, and hardware-aware pruning, enabling rapid co-design of models and hardware across FPGA and ASIC targets. Demonstrations across jet tagging, SVHN, MNIST, and other domains, plus a rich ecosystem of co-design tools and SoC integration, showcase hls4ml as a versatile open-source platform for efficient neural-network acceleration on reconfigurable hardware.

Abstract

We present hls4ml, a free and open-source platform that translates machine learning (ML) models from modern deep learning frameworks into high-level synthesis (HLS) code that can be integrated into full designs for field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs). With its flexible and modular design, hls4ml supports a large number of deep learning frameworks and can target HLS compilers from several vendors, including Vitis HLS, Intel oneAPI and Catapult HLS. Together with a wider eco-system for software-hardware co-design, hls4ml has enabled the acceleration of ML inference in a wide range of commercial and scientific applications where low latency, resource usage, and power consumption are critical. In this paper, we describe the structure and functionality of the hls4ml platform. The overarching design considerations for the generated HLS code are discussed, together with selected performance results.

hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware

TL;DR

Abstract

hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)