A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms
Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, Enrico Calore, Sebastiano Fabio Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke, Stefania Perri
TL;DR
The survey maps the landscape of hardware accelerators for deep learning across heterogeneous HPC platforms, detailing GPUs, TPUs, ASICs, FPGAs, RISCV-based accelerators, and emerging paradigms such as PIM/IMC, neuromorphic systems, multi-chip modules, and quantum/photonic computing. It provides a structured taxonomy, comparing architectures, dataflows, memory hierarchies, and energy-performance trade-offs, while highlighting the role of sparsity and memory technologies in accelerating GEMM and convolution workloads. Key contributions include a comprehensive synthesis of state-of-the-art accelerators, a RISCV-centric view spanning from edge to HPC, and a discussion of challenges in integration, scaling, and efficiency. The practical impact lies in guiding researchers and practitioners to select, design, and optimize accelerators that meet the diverse compute and energy requirements of modern deep learning workloads across the HPC spectrum.
Abstract
Recent trends in deep learning (DL) have made hardware accelerators essential for various high-performance computing (HPC) applications, including image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent developments in DL accelerators, focusing on their role in meeting the performance demands of HPC applications. We explore cutting-edge approaches to DL acceleration, covering not only GPU- and TPU-based platforms but also specialized hardware such as FPGA- and ASIC-based accelerators, Neural Processing Units, open hardware RISC-V-based accelerators, and co-processors. This survey also describes accelerators leveraging emerging memory technologies and computing paradigms, including 3D-stacked Processor-In-Memory, non-volatile memories like Resistive RAM and Phase Change Memories used for in-memory computing, as well as Neuromorphic Processing Units, and Multi-Chip Module-based accelerators. Furthermore, we provide insights into emerging quantum-based accelerators and photonics. Finally, this survey categorizes the most influential architectures and technologies from recent years, offering readers a comprehensive perspective on the rapidly evolving field of deep learning acceleration.
