SCALE-Sim: Systolic CNN Accelerator Simulator
Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna
TL;DR
This work tackles the lack of accessible tools for designing and evaluating systolic-array CNN accelerators. It introduces SCALE-Sim, a public, cycle-accurate simulator that models compute, dataflow, memory, and system integration for configurable 2D systolic arrays and CNN workloads. Through MLPerf-based case studies, it reveals how dataflow choices, scratchpad sizing, array shape, and scaling strategy interact to determine end-to-end performance and energy, offering actionable design insights. The tool aims to speed up accelerator development by enabling rapid exploration of architectural trade-offs and their impact within larger system contexts.
Abstract
Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications. However, the research community lacks tools to insights on both the design trade-offs and efficient mapping strategies for systolic-array based accelerators. We introduce Systolic CNN Accelerator Simulator (SCALE-Sim), which is a configurable systolic array based cycle accurate DNN accelerator simulator. SCALE-Sim exposes various micro-architectural features as well as system integration parameters to the designer to enable comprehensive design space exploration. This is the first systolic-array simulator tuned for running DNNs to the best of our knowledge. Using SCALE-Sim, we conduct a suite of case studies and demonstrate the effect of bandwidth, data flow and aspect ratio on the overall runtime and energy of Deep Learning kernels across vision, speech, text, and games. We believe that these insights will be highly beneficial to architects and ML practitioners.
