A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
Serena Curzel, Fabrizio Ferrandi, Leandro Fiorin, Daniele Ielmini, Cristina Silvano, Francesco Conti, Luca Bompani, Luca Benini, Enrico Calore, Sebastiano Fabio Schifano, Cristian Zambelli, Maurizio Palesi, Giuseppe Ascia, Enrico Russo, Valeria Cardellini, Salvatore Filippone, Francesco Lo Presti, Stefania Perri
TL;DR
The paper addresses the challenge of efficiently deploying deep learning workloads on heterogeneous hardware platforms by surveying a broad design space, from hardware-software co-design and automated synthesis to domain-specific compilers and high-level synthesis. It catalogs tools, frameworks, and methodologies for application partitioning, modeling, simulation, and design-space exploration, as well as DL compilers for HPC and edge systems, HLS-based design flows, and approximate computing. Key contributions include a structured taxonomy of modeling and compilation tools, critical analysis of their capabilities and limitations, and discussion of open research directions such as ML4EDA and LLM-assisted design. The practical impact is to guide researchers and practitioners in selecting and combining tools to maximize performance and energy efficiency of DL accelerators on both HPC systems and edge devices.
Abstract
Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide variety of proposals for specialized deep learning architectures and hardware accelerators. The design of such architectures and accelerators requires a multidisciplinary approach combining expertise from several areas, from machine learning to computer architecture, low-level hardware design, and approximate computing. Several methodologies and tools have been proposed to improve the process of designing accelerators for deep learning, aimed at maximizing parallelism and minimizing data movement to achieve high performance and energy efficiency. This paper critically reviews influential tools and design methodologies for Deep Learning accelerators, offering a wide perspective in this rapidly evolving field. This work complements surveys on architectures and accelerators by covering hardware-software co-design, automated synthesis, domain-specific compilers, design space exploration, modeling, and simulation, providing insights into technical challenges and open research directions.
