hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin, Josiah Hester, Yingyi Luo, John Mamish, Seda Orgrenci-Memik, Thea Aarrestad, Hamza Javed, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, Javier Duarte, Scott Hauck, Shih-Chieh Hsu, Jennifer Ngadiuba, Mia Liu, Duc Hoang, Edward Kreinar, Zhenbin Wu
TL;DR
The paper addresses the need for energy-efficient, edge-enabled ML in science by delivering an open-source codesign workflow (hls4ml) that translates trained neural networks into hardware implementations for FPGA and ASIC. It combines a Python-based workflow with quantization-aware training and pruning, plus end-to-end FPGA and ASIC backends via multiple HLS toolchains, to enable low-power, real-time inference near sensors. Key contributions include quantization-aware pruning, QKeras frontend integration, and device-specific workflows that span Xilinx FPGA and ASIC targets, significantly accelerating hardware-aware ML design for scientific applications. The framework emphasizes introspection, validation, and design-space exploration to empower domain scientists to rapidly iterate and deploy efficient ML accelerators.
Abstract
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.
